Object detection with machine learning

September 20, 2022

Object detection is a crucial task in machine learning, computer vision, and image processing. Its applications range from self-driving vehicles and surveillance systems to medical imaging and agriculture analysis. In this blog post, we will explore the basics of object detection, its techniques, and the tools and platforms that make it possible.

What is object detection?

Object detection is the process of identifying and locating objects of interest within an image or video stream. The objects can have various shapes, sizes, colors, and orientations, and can overlap or partially occlude each other. The goal of object detection is to provide a bounding box around each object and assign a label that describes the object’s class, such as person, car, or animal.

Object detection is a challenging problem due to several factors, including the variations in appearance, lighting conditions, camera angles, and backgrounds. Moreover, object detection requires a high degree of accuracy, as false positives or false negatives can have serious consequences, especially in safety-critical applications.

Object detection techniques

Several techniques can be used for object detection, depending on the type and complexity of the data and the computational resources available. We will overview some of the most common techniques:

Template matching

The template matching technique compares a pre-defined template, or pattern, with a portion of the input image to find a match. The template can be a binary mask or a grayscale image that represents the object’s shape and features. The comparison can be done using correlation or feature extraction methods. Template matching is simple and fast but has limitations in handling variations in scale, rotation, and occlusion.

import cv2

image = cv2.imread('input_image.jpg')
template = cv2.imread('template.jpg')
result = cv2.matchTemplate(image, template, cv2.TM_CCOEFF_NORMED)
min_val, max_val, _, _ = cv2.minMaxLoc(result)
top_left = cv2.minMaxLoc(result)[-2]
bottom_right = (top_left[0] + w, top_left[1] + h)
cv2.rectangle(image , top_left, bottom_right, (0,255,0), 2)
cv2.imwrite('output_image.jpg', image)

Haar cascades

Haar cascades are a type of detector that uses machine learning to learn the features of the object of interest. Haar cascades are composed of several stages, each of which consists of a set of weak classifiers that are learned from positive and negative examples of the object. The weak classifiers use Haar-like features, such as edge, line, and corner detectors, to distinguish between the object and the background.

Haar cascades are effective in detecting faces, eyes, and other simple objects but may suffer from false positives and negatives when the object has variable shape, size, and position.

import cv2

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
image = cv2.imread('input_image.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30), flags=cv2.CASCADE_SCALE_IMAGE)
for (x, y, w, h) in faces:
    cv2.rectangle(image, (x, y), (x+w, y+h), (0, 255, 0), 2)
cv2.imwrite('output_image.jpg', image)

Feature-based detection

Feature-based detection is a more sophisticated technique that uses the input image’s local features, such as corners, edges, and blobs, to detect and describe the objects. The features may be detected using detectors like Harris, FAST, or SURF, and then matched and clustered using descriptors like SIFT, ORB, or AKAZE. The matching and clustering process produces the correspondences between the features in the template and the input image, which can be used to estimate the object’s position and orientation.

Feature-based detection is more robust to variations in scale, rotation, and occlusion but requires more computational resources and has a higher rate of false positives and negatives.

import cv2

detector = cv2.xfeatures2d.SIFT_create()
matcher = cv2.FlannBasedMatcher(dict(algorithm=1, trees=5), {})
image = cv2.imread('input_image.jpg')
template = cv2.imread('template.jpg')
kp1, des1 = detector.detectAndCompute(template, None)
kp2, des2 = detector.detectAndCompute(image, None)
matches = matcher.knnMatch(des1, des2, k=2)
good_matches = [m for m, n in matches if m.distance < 0.7 * n.distance]
if len(good_matches) > 10:
    src_pts = np.float32([kp1[m.queryIdx].pt for m in good_matches]).reshape(-1, 1, 2)
    dst_pts = np.float32([kp2[m.trainIdx].pt for m in good_matches]).reshape(-1, 1, 2)
    M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
    h, w, _ = template.shape
    pts = np.float32([[0, 0], [0, h-1], [w-1, h-1], [w-1, 0]]).reshape(-1, 1, 2)
    dst = cv2.perspectiveTransform(pts, M)
    cv2.polylines(image, [np.int32(dst)], True, (0, 255, 0), 3, cv2.LINE_AA)
cv2.imwrite('output_image.jpg', image)

Object detection platforms

Object detection platforms offer pre-trained models and APIs that allow users to implement object detection quickly and efficiently without extensive knowledge of machine learning or computer vision. Some of the most popular object detection platforms are:

TensorFlow Object Detection API

The TensorFlow Object Detection API is an open-source framework that provides a collection of pre-trained models for object detection and segmentation. The API supports various models, such as Faster R-CNN, SSD, and YOLO, and can handle both single and multiple objects. The API is easy to use, scalable, and provides visualization tools and integration with TensorFlow ecosystem.

import tensorflow as tf
import cv2

model_path = 'path/to/saved/model'
image_path = 'input_image.jpg'

def load_model(path):
    model = tf.saved_model.load(path)
    return model

def detect_objects(image, model):
    image_tensor = tf.convert_to_tensor(image)
    image_tensor = tf.expand_dims(image_tensor, axis=0)
    detections = model(image_tensor)
    return detections

def draw_bbox(image, box, class_name, score):
    xmin, ymin, xmax, ymax = box
    cv2.rectangle(image, (xmin, ymin), (xmax, ymax), (0, 255, 0), 2)
    cv2.putText(image, f"{class_name} {score:.2f}", (xmin, ymin-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)

model = load_model(model_path)
image = cv2.imread(image_path)
detections = detect_objects(image, model)
for box, class_id, score, _ in detections:
    class_name = class_names[class_id]
    draw_bbox(image, box, class_name, score)
cv2.imwrite('output_image.jpg', image)

OpenCV Object Detection

OpenCV Object Detection is a part of the OpenCV library that provides a set of pre-trained Haar cascades and other classifiers for object detection. OpenCV Object Detection is lightweight, fast, and can run on various platforms, including embedded systems and mobile devices.

import cv2

cascade_file = 'haarcascade_frontalface_default.xml'
image_file = 'image_file.jpg'

face_cascade = cv2.CascadeClassifier(cascade_file)
image = cv2.imread(image_file)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30), flags=cv2.CASCADE_SCALE_IMAGE)
for (x, y, w, h) in faces:
    cv2.rectangle(image, (x, y), (x+w, y+h), (0, 255, 0), 2)
cv2.imwrite('output_image.jpg', image)

Amazon Rekognition

Amazon Rekognition is a cloud-based object detection service that provides image and video analysis, face analysis and recognition, and text detection and recognition. Amazon Rekognition uses deep learning models and can identify objects, faces, emotions, and activities in real-time. Amazon Rekognition is scalable, secure, and can integrate with other AWS services and applications.

import boto3

client = boto3.client('rekognition')
image_file = 'image_file.jpg'
collection_id = 'faces_collection'

def create_collection(collection_id):
    response = client.create_collection(CollectionId=collection_id)
    return response

def index_faces(collection_id, bucket_name, image_name):
    response = client.index_faces(CollectionId=collection_id, ExternalImageId=image_name, Image={'S3Object': {'Bucket': bucket_name, 'Name': image_name}})
    return response

def detect_objects(bucket_name, image_name):
    response = client.detect_labels(Image={'S3Object': {'Bucket': bucket_name, 'Name': image_name}}, MaxLabels=10)
    return response

create_collection(collection_id)
index_faces(collection_id, bucket_name, image_name)
detect_objects(bucket_name, image_name)

Conclusion

Object detection is a critical task in many fields, and several techniques and tools can accomplish it. Whether you want to detect faces on your phone or autonomous driving vehicles, you can find a method that suits your needs. By learning the basics of object detection, you can better understand the challenges and limitations of the process and appreciate the efforts of the researchers and engineers who make it possible.

Additional resources:

TensorFlow Object Detection API: https://github.com/tensorflow/models/tree/master/research/object_detection
OpenCV Object Detection: https://docs.opencv.org/master/d9/dba/classcv_1_1CascadeClassifier.html
Amazon Rekognition: https://aws.amazon.com/rekognition/