How Vision Following Drones Work

System Overview

Our drone combines computer vision, machine learning, and adaptive control algorithms to create an intelligent following system. The system runs on Raspberry Pi with a hybrid architecture that balances speed, accuracy, and safety.

👁️ Vision System

Computer Vision Pipeline

Real-time image processing using YOLOv8 and OpenCV for person detection and tracking.

YOLOv8 Object Detection (ONNX Runtime)
CSRT/KCF Visual Tracking
Distance estimation from pixel width
Frame stabilization & jump detection

🎯 Control System

Hybrid Adaptive Control

PID + Feedforward controller that adjusts to human movement patterns.

Distance-based PID control
Speed-matching feedforward
Adaptive gain scheduling
Emergency backup protocols

🛡️ Safety System

Multi-Layer Safety

Comprehensive safety measures for autonomous operation.

Altitude minimum enforcement
Emergency distance thresholds
State machine recovery
Controlled descent algorithms

🚁 Platform Integration

DroneKit + MAVLink

Robust communication with flight controller.

DroneKit Python API
MAVLink velocity commands
Rate-limited communication
Guided mode autopilot

How It Works: Step-by-Step Process

Camera Capture & Frame Processing

The Raspberry Pi camera captures video at 640x480 resolution, 30 FPS. Each frame is immediately passed to the vision pipeline.

Key Insight: The system uses a minimal buffer size (1 frame) to reduce latency. Frames are resized to 160x160 for YOLO inference while maintaining aspect ratio for accurate distance calculations.

Person Detection (YOLOv8)

Each frame is processed by YOLOv8 neural network running via ONNX Runtime. The model detects "person" class with confidence threshold of 0.50.

Detection Logic: When multiple persons are detected, the system selects the one closest to frame center using Euclidean distance calculation.

Visual Tracking & Distance Estimation

Once detected, a CSRT tracker is initialized to follow the person between frames, reducing computational load.

Distance Calculation: Distance = (Person Width × Focal Length) ÷ Pixel Width. Assumes average person width of 0.45m at 1.5m reference distance.

Adaptive Speed Estimation

The system calculates how fast the person is moving by analyzing distance changes over time using a moving window of 5 samples.

Innovation: Person speed is estimated in real-time: positive when moving away, negative when approaching. This enables predictive following.

Hybrid Control Calculation

Combines PID controller for distance maintenance with feedforward term for speed matching.

Python: Control Algorithm HybridController.calculate_velocity()

# PID terms for distance maintenance

error = current_distance - DESIRED_DISTANCE_M  # 1.0m target

P = 0.8 * error  # Proportional

I = 0.25 * integral  # Integral with anti-windup

D = 0.1 * derivative  # Derivative for damping

# Feedforward: Match person's speed

feedforward = 0.7 * person_speed_est  # 70% speed matching

# Combined control signal

raw_velocity = feedforward + (P + I + D)

# Asymmetric limits for safety

if raw_velocity > 0:  # Forward

  velocity = clamp(raw_velocity, 0.1, 0.8)

else:  # Backward (more conservative)

  velocity = clamp(raw_velocity * 1.3, -0.5, -0.1)

Lateral Control & Centering

While maintaining distance, the drone also centers the person in frame by calculating horizontal offset and applying proportional control.

Deadzone: A 12% deadzone prevents unnecessary side-to-side oscillations when person is nearly centered.

MAVLink Command Execution

Velocity commands are sent to the flight controller via DroneKit at 20Hz maximum rate (rate-limited to prevent flooding).

Message Format: SET_POSITION_TARGET_LOCAL_NED messages with body-frame velocities, ignoring position and acceleration for smooth following.

State Recovery & Safety Monitoring

Continuous monitoring for tracking loss, jumps, and safety violations with automatic recovery protocols.

Recovery Logic: If tracker fails, system attempts re-detection every 10 frames for up to 30 frames before entering safety stop.

Core Algorithms Explained

PID

Adaptive PID Control

The PID controller maintains the desired 1.0m following distance:

• Proportional (0.8): Immediate response to distance error
• Integral (0.25): Eliminates steady-state error (anti-windup ±1.5)
• Derivative (0.1): Damping to prevent overshoot

Gains are asymmetric: backward motion gets 30% higher gain for quicker response when person approaches.

Feedforward Speed Matching

Predictive component that matches the person's walking speed:

• 70% speed matching: Drone attempts to match 70% of person's estimated speed
• 5-sample moving window: Speed calculated over recent 5 distance measurements
• 0.05 m/s threshold: Ignores noise below this movement threshold

This creates "predictive following" - the drone anticipates movement rather than just reacting.

⚠️

Safety & Recovery Algorithms

Multi-layer safety system:

• Emergency backup (0.6m): Force backward at 0.4 m/s if closer than 0.6m
• Jump detection (100px): Reinitialize if center moves >100px between frames
• Altitude floor (0.3m): Disable control if below minimum safe altitude
• Reacquisition (30 attempts): Attempt recovery for 30 frames before stopping

System Architecture & Code Structure

🧠 Key Design Decisions

Hybrid Architecture: Combines the stability of v3's state machine with adaptive speed matching for natural following behavior.

Headless Operation: Designed for Raspberry Pi without display, using efficient algorithms and minimal dependencies.

Asymmetric Control: Different limits for forward (0.8 m/s) vs backward (0.5 m/s) movement based on safety considerations.

Python: Main Control Loop track_me_hybrid.py (Simplified)

while True:

  # 1. Get camera frame

  ret, frame = camera.read_frame()

  if follow_state == FollowState.FOLLOWING:

    # 2. Update tracker

    ok, box = tracker.update(frame)

    if ok:

      # 3. Calculate distance & speed

      distance = estimate_distance_from_bbox_width(box_width)

      person_speed = estimate_person_speed(distance, current_time)

      # 4. Hybrid control calculation

      vel_x = controller.calculate_velocity(distance, person_speed)

      vel_y = calculate_lateral_control(box_center, frame_center)

      # 5. Send to drone if safe

      if altitude > MIN_ALT_FOR_CONTROL:

        send_ned_velocity(vel_x, vel_y, 0.0)

    else:

      # 6. Enter recovery mode

      follow_state = FollowState.REACQUIRE

      reacquire_left = REACQUIRE_MAX_FRAMES

  elif follow_state == FollowState.REACQUIRE:

    # 7. Attempt to re-detect person

    boxes, _ = model.predict(frame, REDTECT_CONF)

    if boxes:

      # Reinitialize tracker and resume following

      tracker.init(frame, best_box)

      follow_state = FollowState.FOLLOWING

Performance Characteristics

1.0 m

Target Following Distance

0.8 m/s

Max Forward Speed

30 FPS

Processing Frame Rate

0.3 m

Minimum Safe Altitude

📊 Real-World Performance Notes

Latency: Total system latency (camera to motor command) is approximately 100-150ms on Raspberry Pi 4.

Power Consumption: Complete system draws ~3A at 5V, with YOLO inference being the most computationally intensive component.

Range: Effective following range is 0.6m to 8m, limited by camera resolution and person detection accuracy.

Safety & Failure Modes

🛡️ Comprehensive Safety Protocol

1. Loss of Tracking

Enter reacquisition mode (30 attempts at lower confidence), then enter safety stop if unsuccessful.

2. Too Close (＜0.6m)

Emergency backward maneuver at 0.4 m/s regardless of control signal.

3. Low Altitude (＜0.3m)

Disable horizontal control, only allow vertical movement for landing.

4. Jump Detection (＞100px)

Assume tracker failure, reinitialize with fresh detection.