Real-time Object Detection with YOLO and Webcam: Enhancing Your Computer Vision Skills

Learn How to Build Your Own Object Detection System with YOLO and Webcam Integration for Real-time Monitoring and Analysis.

Dipankar Medhi
5 min readMar 27, 2023
Object detention with Opencv, YOLO

Object detection has become an increasingly popular field in computer vision, with YOLO (You Only Look Once) being one of the most widely used algorithms. In this blog post, we will explore how to use YOLO and a webcam to get started with a real-time object detection system.

YOLO was developed by Joseph Redmon and his team at the University of Washington and has become one of the most popular object detection algorithms used in computer vision applications.

Unlike traditional object detection algorithms that require multiple passes over an image, YOLO processes the entire image in a single pass, making it much faster and more efficient.

YOLO has been used in a variety of applications, including self-driving cars, security systems, and image and video analysis

YOLO has been implemented in several deep learning frameworks, including Darknet, TensorFlow, and PyTorch. The original implementation of YOLO was done using the Darknet framework, which was developed by Joseph Redmon.

Here are some references for YOLO and its implementation:

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. “You Only Look Once: Unified, Real-Time Object Detection.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. https://arxiv.org/abs/1506.02640

Darknet website: https://github.com/pjreddie/darknet

TensorFlow implementation of YOLO: https://github.com/hizhangp/yolo_tensorflow

PyTorch implementation of YOLO: https://github.com/marvis/pytorch-yolo2

🌎 Set up the environment

To begin, we’ll need to set up your environment. We need a Python environment with OpenCV, a popular computer vision library, and YOLO installed.

  1. We create a main.py file where we import the initial requirements, such as Opencv and get the base ready for detecting through the webcam.
  2. Install all the necessary dependencies, such as ultralytics, opencv-python and other dependencies that we’ll cover later in this blog.
$ pip install opencv-python

📷 Start capturing from the webcam

  1. we’ll capture frames from the webcam using OpenCV. This can be done using the VideoCapture function in OpenCV.
import cv2

cap = cv2.VideoCapture(0)
cap.set(3, 640)
cap.set(4, 480)

while True:
ret, img= cap.read()
cv2.imshow('Webcam', img)

if cv2.waitKey(1) == ord('q'):
break

cap.release()
cv2.destroyAllWindows()

2. In the above code, we create a VideoCapture object and set it to capture frames from the default camera (0). It sets the resolution of the webcam to 640x480. We then loop through the frames and display them in a window until the user presses ‘q’ to exit.

🔨 Operating YOLO with ultralytics

  1. We install the ultralytics library that makes working with YOLO very easy and hassle-free.
$ pip install ultralytics

2. The YOLO model is loaded using the ultralytics library and specifies the location of the YOLO weights file in the yolo-Weights/yolov8n.pt.

from ultralytics import YOLO
model = YOLO("yolo-Weights/yolov8n.pt")

3. We instantiate a classNames variable containing a list of object classes that the YOLO model is trained to detect.

classNames = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck", "boat",
"traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat",
"dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella",
"handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat",
"baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup",
"fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli",
"carrot", "hot dog", "pizza", "donut", "cake", "chair", "sofa", "pottedplant", "bed",
"diningtable", "toilet", "tvmonitor", "laptop", "mouse", "remote", "keyboard", "cell phone",
"microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors",
"teddy bear", "hair drier", "toothbrush"
]

4. The while loop starts and it reads each frame from the webcam using cap.read(). Then it passes the frame to the YOLO model for object detection. The results of object detection are stored in the ‘results’ variable.

import cv2

cap = cv2.VideoCapture(0)
cap.set(3, 640)
cap.set(4, 480)

while True:
ret, img= cap.read()
results = model(img, stream=True)

cv2.imshow('Webcam', frame)

if cv2.waitKey(1) == ord('q'):
break

cap.release()
cv2.destroyAllWindows()

5. For each result, the code extracts the bounding box coordinates of the detected object and draws a rectangle around it using cv2.rectangle(). It also prints the confidence score and class name of the detected object on the console.

📃 Complete code — object detection with YOLO and webcam

from ultralytics import YOLO
import cv2
import math
# start webcam
cap = cv2.VideoCapture(0)
cap.set(3, 640)
cap.set(4, 480)

# model
model = YOLO("yolo-Weights/yolov8n.pt")

# object classes
classNames = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck", "boat",
"traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat",
"dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella",
"handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat",
"baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup",
"fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli",
"carrot", "hot dog", "pizza", "donut", "cake", "chair", "sofa", "pottedplant", "bed",
"diningtable", "toilet", "tvmonitor", "laptop", "mouse", "remote", "keyboard", "cell phone",
"microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors",
"teddy bear", "hair drier", "toothbrush"
]


while True:
success, img = cap.read()
results = model(img, stream=True)

# coordinates
for r in results:
boxes = r.boxes

for box in boxes:
# bounding box
x1, y1, x2, y2 = box.xyxy[0]
x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2) # convert to int values

# put box in cam
cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 255), 3)

# confidence
confidence = math.ceil((box.conf[0]*100))/100
print("Confidence --->",confidence)

# class name
cls = int(box.cls[0])
print("Class name -->", classNames[cls])

# object details
org = [x1, y1]
font = cv2.FONT_HERSHEY_SIMPLEX
fontScale = 1
color = (255, 0, 0)
thickness = 2

cv2.putText(img, classNames[cls], org, font, fontScale, color, thickness)

cv2.imshow('Webcam', img)
if cv2.waitKey(1) == ord('q'):
break

cap.release()
cv2.destroyAllWindows()

🔥 Result

Me holding a smartphone 📱

webcam detection

🖐 Conclusion

In this blog, we have discussed the steps to implement YOLO webcam detection using Python and OpenCV. By following these steps, you can easily build your own object detection system and customize it to suit your specific needs.

Overall, YOLO webcam detection is a fascinating area of computer vision with numerous possibilities for future advancements and applications.

Happy coding 💛

--

--