theSAHQ.SARM

ENHANCING BLIND NAVIGATION WITH OBJECT DETECTION USING CNNS

According to the World Health Organization, there are currently over 285 million visually impaired individuals worldwide, with 39 million of them being completely blind. With this number expected to rise in the coming years, there is a pressing need for technological solutions that can assist the blind in their daily lives. This project aims to present a solution that uses computer vision and convolutional neural networks (CNNs) to aid the visually impaired in navigating their surroundings. The project contains a camera (I used PIXY camera) that works as an input sensor. The Raspberry Pi is used for running the code and processing the algorithms and finally it would either alert the user via vibration motor attached to the handle of the stick or speaker based on the preference of the user. All the components will be embedded in the stick itself.

MARK 1 : In the main loop, the code reads a frame from the video input, converts it from BGR to HSV color space, and applies color thresholding to extract the red regions. The code then applies morphological operations to remove noise and fill gaps in the image. The code finds contours in the binary image and loops over each contour. If the contour area is within a certain range, the code draws a bounding box around it and prints a message indicating that a stop sign has been detected. The sound file is played through the speakers to alert the user. If a stop sign is not detected, the code prints a message indicating that it was not detected. Finally, the video feed with the identified stop sign is displayed, and the user can exit the program by pressing the ESC key. The VideoCapture object is then released, and all windows are closed. Overall, this code provides a useful application of computer vision and sound playback for detecting stop signs in real-time video input, which could be useful for the visually impaired.

MARK 2 : To overcome the first and most important drawback of the mark1, which is the limited detection of only red stop signs, a more sophisticated approach can be implemented using a Convolutional Neural Network (CNN). This approach involves training a model using a large dataset of stop sign images to recognize different variations of stop signs, regardless of their color or shape. The model can then be used to detect and classify stop signs in real-time video frames. The model is designed to detect stop signs in different orientations and scales, as well as under different lighting conditions and backgrounds. The trained model is then used to perform real-time detection of stop signs in a live video stream. Using a CNN approach provides several advantages over the initial code. First, it allows for more accurate detection of stop signs, regardless of their color or shape. Second, it can handle situations where there are multiple red objects in the frame without producing false positives. Finally, it provides a more versatile solution that can be applied to other object detection tasks, beyond just stop signs

VGG16 is a convolutional neural network model that was developed by the Visual Geometry Group at the University of Oxford. It has been trained in a large-scale image classification task, i.e., to classify images into one of 1,000 different categories. VGG16 consists of 13 convolutional layers followed by 3 fully connected layers. It is a widely used model for computer vision tasks, including image classification and object detection.

MARK 3 : The Mark 1 had a limitation in that it could only detect stop signs, which was not resolved in the Mark 2. However, this issue has been addressed in the Mark 3 by incorporating a wider variety of important traffic signs. To detect traffic signs in images, the YOLOv5 (which stands for "You Only Look Once”) algorithm was used. YOLOv5 is a well-known object detection algorithm that utilizes a single neural network for both object recognition and bounding box regression. Essentially, it takes an image as input and outputs the coordinates and labels of the objects in the image. For this specific implementation, a pre-trained YOLOv5 model was fine-tuned on a dataset of traffic sign images and labels using transfer learning. The model was then utilized to detect traffic signs in test images. Overall, this approach demonstrates the effectiveness of YOLOv5 in detecting traffic signs and can be useful for various real-world applications such as traffic management systems and self-driving cars.

The YOLOv5 algorithm is a state-of-the-art object detection algorithm that uses convolutional neural networks to detect and classify objects in images. The algorithm uses anchor boxes to define the shape and size of the objects and predicts the class probabilities and bounding box coordinates for each anchor box. The YOLOv5 algorithm is trained using a combination of image and label data, where the label data contains the information about the object locations and classes in the images. The main advantage of the YOLOv5 algorithm over other object detection algorithms is its speed and accuracy. The algorithm can detect objects in real-time with high accuracy and can be trained on large datasets using modern GPUs. The algorithm has several variations, including YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x, with varying numbers of layers and model sizes

References:
https://builtin.com/machine-learning/vgg16
https://oyyarko.medium.com/opencv-brief-note-on-how-you-can-access-webcam-in-google-colab-d4d84efc301f
https://www.mathworks.com/help/gpucoder/ug/code-generation-for-traffic-sign-detection-and-recognition-networks.html
https://github.com/ultralytics/yolov5
https://github.com/satojkovic/DeepTrafficSign
https://builtin.com/machine-learning/vgg16
https://stackoverflow.com/questions/40821954/no-module-named-imutils-perspective-after-pip-installing
https://pytorch.org/hub/ultralytics_yolov5/#:~:text=YOLOv5%20%F0%9F%9A%80%20is%20a%20family,to%20ONNX%2C%20CoreML%20and%20TFLite.

Project Link:
https://github.com/sahq-azhar/CS5330-PRCV-FinalProject