← Back to all articles

Bird Detector: Real-time bird tracking with YOLOv8 and OpenCV

A small Python project that watches a video feed — webcam, MJPEG, RTSP or YouTube — detects birds with YOLOv8, tracks them across frames with persistent IDs and logs how long each one stays in the frame.

The idea

I wanted a tiny computer-vision tool that I could point at any video source and have it tell me which birds visited and how long they stuck around. No web UI, no cloud — just a single Python script that opens a window with annotated boxes and writes a CSV log on the side.

The whole thing lives in a single main.py file. It is a good example of how small the gap between "interesting model" and "useful tool" has become.

Stack & building blocks

  • YOLOv8 (Ultralytics) — pre-trained yolov8s.pt model with the COCO bird class
  • OpenCV — video capture, drawing overlays, the display window
  • argparse — pick between camera, mjpeg, youtube or rtsp sources
  • yt-dlp — resolve a playable URL for YouTube live streams

Anatomy of the script

1. Configurable input sources

A single open_capture() helper switches between input types so the rest of the loop stays source-agnostic. RTSP gets a forced TCP transport via the OpenCV / FFmpeg env var, MJPEG uses tighter open / read timeouts, and YouTube goes through yt-dlp to extract a direct media URL.

def open_capture(source_type, camera_index, mjpeg_url, youtube_url, rtsp_url):
    if source_type == "camera":
        return cv2.VideoCapture(camera_index)

    if source_type == "mjpeg":
        cap = cv2.VideoCapture(mjpeg_url)
        cap.set(cv2.CAP_PROP_BUFFERSIZE, 1)
        cap.set(cv2.CAP_PROP_OPEN_TIMEOUT_MSEC, 5000)
        cap.set(cv2.CAP_PROP_READ_TIMEOUT_MSEC, 10000)
        return cap

    if source_type == "rtsp":
        os.environ["OPENCV_FFMPEG_CAPTURE_OPTIONS"] = "rtsp_transport;tcp"
        return cv2.VideoCapture(rtsp_url, cv2.CAP_FFMPEG)

    stream_url = resolve_youtube_stream(youtube_url)
    return cv2.VideoCapture(stream_url)

2. Detection with persistent tracking

Each frame is passed through model.track() with persist=True. This is the key flag — it tells YOLO to keep track IDs stable across frames, so I can answer the dwell-time question rather than just "is there a bird right now".

results = model.track(
    frame,
    persist=True,
    verbose=False,
    classes=[bird_class_id],
)[0]

3. Dwell-time bookkeeping

A small active_birds dictionary maps each track ID to a first-seen and last-seen timestamp. When a bird is detected for the first time, the script writes a snapshot of the frame to disk (with a cooldown so it doesn't spam the folder). When it has not been seen for EXIT_TIMEOUT seconds, the session is closed and a row is appended to logs/bird_time_log.csv:

track_id,entered_at,left_at,time_spent_seconds
1,2026-05-15_08-14-02,2026-05-15_08-14-31,29.18

4. Resilient streams

Remote streams disconnect — a lot. The loop tolerates up to max_read_errors consecutive failed reads, then releases the capture and reopens it. Simple, but enough to keep an MJPEG or RTSP feed running for hours without manual intervention.

What I liked about building this

  • The whole pipeline (input → detection → tracking → logging) fits on one screen of Python.
  • Ultralytics' tracking is good enough out of the box — no separate tracker library to wire up.
  • Swapping the source from webcam to a live MJPEG bird-feeder cam is a single CLI flag.
  • The CSV log is boring on purpose: it plays well with pandas, Grafana, or a quick spreadsheet.

What's next

Obvious upgrades: species classification on top of detection, a small dashboard over the CSV log, and a headless mode for running on a Raspberry Pi pointed at the garden. The skeleton is already there — most of those changes are 20–50 lines.

Code & questions

Happy to discuss the trade-offs or share the full script.

Get in touch