Live Object Detection: A Python Tutorial

Welcome back, tech enthusiasts! In today’s article, we will delve into the exciting world of live object detection using Python. We will explore how to implement an object recognition tool, step by step. So, let’s dive right in!

Live Object Detection: A Python Tutorial
Live Object Detection: A Python Tutorial

Setting Up the Project Structure

Before we begin, let’s set up our project structure. Instead of simply installing a library and writing some code, we will need a few external files for this project. We will be utilizing a pre-trained model for object recognition, which requires specific files.

To get started, head to github.com/user/chu-wonkey-305 and download the “mobile_net_ssd_deploy.prototxt” and “mobile_net_ssd.caffe.model” files. Place these files in the same directory as your main Python file.

Additionally, you can choose and include any images you prefer for this project. Images of rooms with people or streets, for instance, work great for object recognition.

Here’s an overview of your project structure:

- main.py
- models/
    - mobile_net_ssd_deploy.prototxt
    - mobile_net_ssd.caffe.model
- images/
    - room_people.jpeg
    - street.jpeg
    - ...

Note: We would like to acknowledge the invaluable contributions of the repositories we referenced for this project, such as github.com/user/chu-wonkey-305, and the website “Pi Image Search” for their knowledge and inspiration.

Installing Required Libraries

Now that our project structure is set up, let’s install the necessary libraries. Open your command prompt and install the following libraries:

pip install numpy
pip install opencv-python

Note: If you have already installed these libraries, there is no need to reinstall them.

Implementing Object Detection

With the project structure and libraries ready, it’s time to implement the object detection functionality. We will be using OpenCV, a popular computer vision library, to accomplish this.

To get started, import the required libraries within your Python script:

import cv2
import numpy as np

Next, specify the paths to the relevant files. We need to know the image path, the prototxt path, and the model path. Additionally, set a minimum confidence level for the object detection. A confidence level of 0.2 (20%) generally works well.

# Specify file paths
image_path = "images/room_people.jpeg"
prototxt_path = "models/mobile_net_ssd_deploy.prototxt"
model_path = "models/mobile_net_ssd.caffe.model"

# Set minimum confidence level
min_confidence = 0.2

Now, define a list of classes that the model can recognize. These classes will correspond to objects that the model can detect, such as “person,” “car,” or “chair.”

# Define the list of classes
classes = [
    "background",
    "airplane",
    "bicycle",
    "bird",
    "cat",
    "cow",
    "dog",
    "horse",
    "sheep",
    "bus",
    "car",
    "motorbike",
    "train",
    "aeroplane",
    "boat",
    "traffic light",
    "fire hydrant",
    "stop sign",
    "parking meter",
    "bench",
    "bird",
    "cat",
    "dog",
    "horse",
    "sheep",
    "cow",
    "elephant",
    "bear",
    "zebra",
    "giraffe",
    "backpack",
    "umbrella",
    "handbag",
    "tie",
    "suitcase",
    "frisbee",
    "skis",
    "snowboard",
    "sports ball",
    "kite",
    "baseball bat",
    "baseball glove",
    "skateboard",
    "surfboard",
    "tennis racket",
    "bottle",
    "wine glass",
    "cup",
    "fork",
    "knife",
    "spoon",
    "bowl",
    "banana",
    "apple",
    "sandwich",
    "orange",
    "broccoli",
    "carrot",
    "hot dog",
    "pizza",
    "donut",
    "cake",
    "chair",
    "sofa",
    "potted plant",
    "bed",
    "dining table",
    "toilet",
    "tv monitor",
    "laptop",
    "mouse",
    "remote",
    "keyboard",
    "cell phone",
    "microwave",
    "oven",
    "toaster",
    "sink",
    "refrigerator",
    "book",
    "clock",
    "vase",
    "scissors",
    "teddy bear",
    "hair dryer",
    "toothbrush",
]

Afterward, load the neural network model into your script using the OpenCV’s readNetFromCaffe() function:

# Load the neural network model
net = cv2.dnn.readNetFromCaffe(prototxt_path, model_path)

Now, we need to capture an image, resize it to match the input size of the neural network, and feed it into the model for prediction.

# Capture and preprocess the image
image = cv2.imread(image_path)
(height, width) = image.shape[:2]

blob = cv2.dnn.blobFromImage(
    cv2.resize(image, (300, 300)),
    0.007843,  # Scale factor
    (300, 300),  # Size
    127.5,  # Mean
)

Once the image is preprocessed, we can pass it through the neural network and obtain the detected objects as the result.

# Forward the image through the network
net.setInput(blob)
detections = net.forward()

Now it’s time to iterate through the detected objects, filter them based on the minimum confidence level, and draw bounding boxes around them. We’ll also display the class and confidence score for each object.

# Iterate over the detected objects
for i in range(detections.shape[2]):
    confidence = detections[0, 0, i, 2]

    # Filter out low-confidence detections
    if confidence > min_confidence:
        class_index = int(detections[0, 0, i, 1])
        upper_left_x = int(detections[0, 0, i, 3] * width)
        upper_left_y = int(detections[0, 0, i, 4] * height)
        lower_right_x = int(detections[0, 0, i, 5] * width)
        lower_right_y = int(detections[0, 0, i, 6] * height)

        # Draw the bounding box
        color = COLORS[class_index % len(COLORS)]
        cv2.rectangle(image, (upper_left_x, upper_left_y), 
                      (lower_right_x, lower_right_y), color, 2)

        # Display object class and confidence
        label = f"{classes[class_index]}: {confidence:.2f}"
        cv2.putText(image, label, (upper_left_x, upper_left_y - 15),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 2)

Finally, we can display the image with the bounding boxes and labels using cv2.imshow(). Don’t forget to include cv2.waitKey(0) and cv2.destroyAllWindows() to gracefully exit the program.

# Display the image
cv2.imshow("Detected Objects", image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Object Detection with Live Camera Feed

To take the object detection to the next level, let’s use live camera feed instead of static images. We will modify our script to continuously capture frames from the camera and perform object detection on them.

Further reading:  Artificial Intelligence: Unleashing the Power of Intelligent Machines

First, import the required libraries:

import cv2
import numpy as np

Next, initialize the camera capture object:

cap = cv2.VideoCapture(0)

Inside the infinite loop, continuously read frames from the camera and perform object detection on each frame:

while True:
    # Capture frame-by-frame
    ret, frame = cap.read()

    # Perform object detection on frame

    # Display the frame
    cv2.imshow("Live Object Detection", frame)

    # Break the loop on 'q' key press
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

Finally, release the camera and destroy all windows:

# Release the camera and close windows
cap.release()
cv2.destroyAllWindows()

Note: Remember to adjust the cap = cv2.VideoCapture(0) line based on your camera index. If you have multiple cameras, you may need to change the parameter to access a different one (e.g., cap = cv2.VideoCapture(1)).

That’s it! You have now learned how to implement live object detection in Python. Have fun experimenting with different objects and enjoy exploring the possibilities of computer vision.

FAQs

Q: Where can I find the pre-trained model files?

A: You can find the pre-trained model files, including the “mobile_net_ssd_deploy.prototxt” and “mobile_net_ssd.caffe.model” files, on github.com/user/chu-wonkey-305.

Q: How can I change the minimum confidence level for object detection?

A: To change the minimum confidence level, modify the min_confidence variable in your code. The default value is set to 0.2 (20%). Experiment with different values to find the optimal confidence level for your specific use case.

Q: Can I use my own images for object detection?

A: Absolutely! You can use your own images by replacing the file path in the image_path variable with the location of your chosen image file. Ensure that the image file is in a supported format (e.g., JPEG, PNG).

Further reading:  Building an Intelligent Financial Assistant in Python

Q: How can I adjust the display window size?

A: The display window size is determined by the resolution of the captured frames from the camera. You can adjust the window size by resizing the captured frames using the cv2.resize() function. Experiment with different parameters to achieve your desired window size.

Conclusion

Congratulations! You have successfully learned how to implement live object detection in Python using OpenCV. We explored the step-by-step process, from setting up the project structure to applying object detection to static images and live camera feeds. Now you have the power to explore and experiment with the world of computer vision.

Remember to check out the official website of Techal for more insightful articles and tutorials on the latest technology trends. Stay curious, keep learning, and enjoy your tech-filled journey!

Written by Techal Team

YouTube video
Live Object Detection: A Python Tutorial