Autonomous Vision Following Drone System
Technical Documentation: How Our Hybrid Tracking System Works
Our drone combines computer vision, machine learning, and adaptive control algorithms to create an intelligent following system. The system runs on Raspberry Pi with a hybrid architecture that balances speed, accuracy, and safety.
Real-time image processing using YOLOv8 and OpenCV for person detection and tracking.
PID + Feedforward controller that adjusts to human movement patterns.
Comprehensive safety measures for autonomous operation.
Robust communication with flight controller.
The Raspberry Pi camera captures video at 640x480 resolution, 30 FPS. Each frame is immediately passed to the vision pipeline.
Each frame is processed by YOLOv8 neural network running via ONNX Runtime. The model detects "person" class with confidence threshold of 0.50.
Once detected, a CSRT tracker is initialized to follow the person between frames, reducing computational load.
The system calculates how fast the person is moving by analyzing distance changes over time using a moving window of 5 samples.
Combines PID controller for distance maintenance with feedforward term for speed matching.
While maintaining distance, the drone also centers the person in frame by calculating horizontal offset and applying proportional control.
Velocity commands are sent to the flight controller via DroneKit at 20Hz maximum rate (rate-limited to prevent flooding).
Continuous monitoring for tracking loss, jumps, and safety violations with automatic recovery protocols.
The PID controller maintains the desired 1.0m following distance:
Gains are asymmetric: backward motion gets 30% higher gain for quicker response when person approaches.
Predictive component that matches the person's walking speed:
This creates "predictive following" - the drone anticipates movement rather than just reacting.
Multi-layer safety system:
Hybrid Architecture: Combines the stability of v3's state machine with adaptive speed matching for natural following behavior.
Headless Operation: Designed for Raspberry Pi without display, using efficient algorithms and minimal dependencies.
Asymmetric Control: Different limits for forward (0.8 m/s) vs backward (0.5 m/s) movement based on safety considerations.
Latency: Total system latency (camera to motor command) is approximately 100-150ms on Raspberry Pi 4.
Power Consumption: Complete system draws ~3A at 5V, with YOLO inference being the most computationally intensive component.
Range: Effective following range is 0.6m to 8m, limited by camera resolution and person detection accuracy.
Enter reacquisition mode (30 attempts at lower confidence), then enter safety stop if unsuccessful.
Emergency backward maneuver at 0.4 m/s regardless of control signal.
Disable horizontal control, only allow vertical movement for landing.
Assume tracker failure, reinitialize with fresh detection.