Object Tracking using YOLO (ONNX) + ByteTrack
This example demonstrates how to perform multi-object tracking in a video using computer vision.
The system detects objects in each frame and assigns a consistent tracking ID to each object as it moves across frames.
Object tracking allows systems to understand movement patterns and trajectories of objects in video streams.
This is ideal for:
- 🚗 Traffic monitoring systems
- 🏬 Retail customer movement analysis
- 🏭 Industrial object tracking
- 🎥 Smart surveillance systems
- 📊 Motion analytics in video streams
🚀 What This Project Does
The system performs the following steps:
- Detects objects using a YOLO ONNX model
- Tracks objects across frames using ByteTrack
- Assigns a unique ID to each detected object
- Maintains object identity across frames
- Draws bounding boxes and tracking IDs on the video
- Saves the annotated output video
Each detected object is labeled as:
#ID
#3
This means the detected object has tracking ID 3 and will keep the same ID as long as it remains visible.
📁 Project Structure
tracking/
├── yolov11x-1280_onnx.py # Main tracking script
├── models/
│ └── yolov11x-1280.onnx # YOLO ONNX detection model
├── data/
│ └── input.mp4 # Example input video
└── README.md # Example documentation
📥 Model Download
Pretrained models for swatahVision are available in the Model Zoo.
🔗 https://visionai4bharat.github.io/swatahVision/model_zoo/
📁 Project Setup
1️⃣ Clone the Repository
git clone https://github.com/VisionAI4Bharat/swatahVision.git
cd swatahVision/examples/tracking
2️⃣ (Optional) Create Virtual Environment
python3 -m venv venv
source venv/bin/activate
3️⃣ Install Dependencies
pip install -r requirements.txt
🚗 Run Object Tracking
Run the tracking script:
python yolov11x-1280_onnx.py \
--source_video_path "data/input.mp4" \
--source_weights_path "models/yolov11x-1280.onnx" \
--classes 0 \
--confidence_threshold 0.3 \
--iou_threshold 0.7
⚙️ Command Line Arguments
Required
--source_video_path
Path to input video file.
Optional
--source_weights_path
Path to YOLO ONNX detection model .
--classes
List of class IDs to track.
Example:
--classes 0
If this argument is empty, all detected classes will be tracked.
--confidence_threshold
Detection confidence threshold.
Default: 0.3
--iou_threshold
IoU threshold for Non-Max Suppression.
Default: 0.7
🧠 How It Works (Technical Overview)
Processing Pipeline:
- Load YOLO ONNX object detection model
- Read frames from the input video
- Run object detection on each frame
- Filter detections by selected classes
- Pass detections to the ByteTrack tracker
- Assign unique tracking IDs to objects
- Maintain identity across frames
- Draw bounding boxes and IDs on the video
- Save annotated output video
Multi-object tracking systems aim to estimate object locations and maintain consistent identities across frames in a video.
🧠 ByteTrack Tracking Algorithm
This example uses ByteTrack, a high-performance multi-object tracking algorithm.
Key features:
- Tracks multiple objects simultaneously
- Maintains consistent IDs across frames
- Handles occlusions and missed detections
- Works in real-time video analytics systems ByteTrack associates detection boxes across frames to maintain object identities.
📊 Output
The generated output video contains: - Bounding boxes around detected objects - Unique tracking IDs for each object - Continuous tracking across frames - Annotated output video saved to disk
Example overlay:
#1
#4
#7
🖥 Model Requirements
This example requires a YOLO ONNX object detection model.
You can:
- Export a model from Ultralytics YOLO
- Use a pre-trained ONNX model
- Use your own trained detection model
Example export:
pip install ultralytics
yolo export model=yolov8x.pt format=onnx imgsz=1280
🎯 Practical Applications
- Smart surveillance systems
- Crowd movement analysis
- Traffic monitoring and vehicle tracking
- Industrial object monitoring
- Retail customer behavior analysis
📌 Notes
- GPU recommended for real-time performance
- Works best with stable camera footage
- Supports tracking multiple objects simultaneously
- Tracking accuracy depends on detection quality
- Suitable for both indoor and outdoor video analytics