The idea
I wanted a tiny computer-vision tool that I could point at any video source and have it tell me which birds visited and how long they stuck around. No web UI, no cloud — just a single Python script that opens a window with annotated boxes and writes a CSV log on the side.
The whole thing lives in a single main.py file. It is a
good example of how small the gap between "interesting model" and
"useful tool" has become.
Stack & building blocks
- YOLOv8 (Ultralytics) — pre-trained
yolov8s.ptmodel with the COCObirdclass - OpenCV — video capture, drawing overlays, the display window
- argparse — pick between
camera,mjpeg,youtubeorrtspsources - yt-dlp — resolve a playable URL for YouTube live streams
Anatomy of the script
1. Configurable input sources
A single open_capture() helper switches between input
types so the rest of the loop stays source-agnostic. RTSP gets a
forced TCP transport via the OpenCV / FFmpeg env var, MJPEG uses
tighter open / read timeouts, and YouTube goes through yt-dlp to
extract a direct media URL.
def open_capture(source_type, camera_index, mjpeg_url, youtube_url, rtsp_url):
if source_type == "camera":
return cv2.VideoCapture(camera_index)
if source_type == "mjpeg":
cap = cv2.VideoCapture(mjpeg_url)
cap.set(cv2.CAP_PROP_BUFFERSIZE, 1)
cap.set(cv2.CAP_PROP_OPEN_TIMEOUT_MSEC, 5000)
cap.set(cv2.CAP_PROP_READ_TIMEOUT_MSEC, 10000)
return cap
if source_type == "rtsp":
os.environ["OPENCV_FFMPEG_CAPTURE_OPTIONS"] = "rtsp_transport;tcp"
return cv2.VideoCapture(rtsp_url, cv2.CAP_FFMPEG)
stream_url = resolve_youtube_stream(youtube_url)
return cv2.VideoCapture(stream_url) 2. Detection with persistent tracking
Each frame is passed through model.track() with
persist=True. This is the key flag — it tells YOLO to
keep track IDs stable across frames, so I can answer the dwell-time
question rather than just "is there a bird right now".
results = model.track(
frame,
persist=True,
verbose=False,
classes=[bird_class_id],
)[0] 3. Dwell-time bookkeeping
A small active_birds dictionary maps each track ID to a
first-seen and last-seen timestamp. When a bird is detected for the
first time, the script writes a snapshot of the frame to disk (with
a cooldown so it doesn't spam the folder). When it has not been seen
for EXIT_TIMEOUT seconds, the session is closed and a
row is appended to logs/bird_time_log.csv:
track_id,entered_at,left_at,time_spent_seconds
1,2026-05-15_08-14-02,2026-05-15_08-14-31,29.18 4. Resilient streams
Remote streams disconnect — a lot. The loop tolerates up to
max_read_errors consecutive failed reads, then releases
the capture and reopens it. Simple, but enough to keep an MJPEG or
RTSP feed running for hours without manual intervention.
What I liked about building this
- The whole pipeline (input → detection → tracking → logging) fits on one screen of Python.
- Ultralytics' tracking is good enough out of the box — no separate tracker library to wire up.
- Swapping the source from webcam to a live MJPEG bird-feeder cam is a single CLI flag.
- The CSV log is boring on purpose: it plays well with pandas, Grafana, or a quick spreadsheet.
What's next
Obvious upgrades: species classification on top of detection, a small dashboard over the CSV log, and a headless mode for running on a Raspberry Pi pointed at the garden. The skeleton is already there — most of those changes are 20–50 lines.