Designing an Assistive Tool for Visually Impaired People Based on Object Detection Technique
Merancang Alat Bantu Bagi Penyandang Disabilitas Visual Berbasis Teknik Deteksi Objek
Keywords:
deeplearning, YOLO, Object detection, Visually Impaired, Text-to-speechAbstract
Visually impaired individuals often face significant challenges in navigating their environments due to limited access to visual information. To address this issue, we propose an assistive tool designed to operate on a PC. The focus of this research is on developing an efficient, lightweight object detection system to ensure real-time performance while maintaining compatibility with low-resource setups. The proposed system enhances the autonomy and accessibility of visually impaired individuals by providing audio descriptions of their surroundings through the processing of live-streaming video. The core of the system is an object detection module based on the state-of-the-art YOLO7 model, designed to identify multiple objects in real-time within the user's environment. The system processes video frames captured by a camera, identifies objects, and delivers the results as audio descriptions using the pyttsx3 text-to-speech library, ensuring offline functionality and robust performance. The system demonstrates satisfactory results, achieving inference speeds ranging from 0.12 to 1.14 seconds for object detection, as evaluated through quantitative metrics and subjective assessments. In conclusion, the proposed tool effectively aids visually impaired individuals by providing accurate and timely audio descriptions, thereby promoting greater independence and accessibility.References
[1] V. Kumar, V. Teja, A. Kumar, V. Harshavardhan and U. Sahith, "Image Summarizer for the Visually Impaired Using Deep Learning," 2021 International Conference on System, Computation, Automation and Networking (ICSCAN), Puducherry, India, 2021, pp. 1-4, doi: 10.1109/ICSCAN53069.2021.9526465.
[2] B. Arystanbekov, A. Kuzdeuov, S. Nurgaliyev and H. A. Varol, "Image Captioning for the Visually Impaired and Blind: A Recipe for Low-Resource Languages," 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia, 2023, pp. 1-4, doi: 10.1109/EMBC40787.2023.10340575.
[3] A. Chharia and R. Upadhyay, "Deep Recurrent Architecture based Scene Description Generator for Visually Impaired," 2020 12th International Congress on Ultra-Modern Telecommunications and Control Systems and Workshops (ICUMT), Brno, Czech Republic, 2020, pp. 136-141, doi: 10.1109/ICUMT51630.2020.9222441.
[4] C. C et al., "Image/Video Summarization in Text/Speech for Visually Impaired People," 2022 IEEE 2nd Mysore Sub Section International Conference (MysuruCon), Mysuru, India, 2022, pp. 1-6, doi: 10.1109/MysuruCon55714.2022.9972653.
[5] T. Mohandoss, J. Rangaraj, Multi-Object Detection using Enhanced YOLOv2 and LuNet Algorithms in Surveillance Videos, e-Prime - Advances in Electrical Engineering, Electronics and Energy, Volume 8, 2024, 100535, ISSN 2772-6711, https://doi.org/10.1016/j.prime.2024.100535.
[6] M. Sarkar, S. Biswas and B. Ganguly, "A Hybrid Transfer Learning Architecture Based Image Captioning Model for Assisting Visually Impaired," 2023 IEEE 3rd Applied Signal Processing Conference (ASPCON), India, 2023, pp. 211-215, doi: 10.1109/ASPCON59071.2023.10396262.
[7] A. Yousif and M. Al-Jammas, “Exploring deep learning approaches for video captioning: A comprehensive review,” e-Prime - Adv. Electr. Eng. Electron. Energy, vol. 6, no. October, p. 100372, 2023, doi: 10.1016/j.prime.2023.100372.
[8] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation," 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 580-587, doi: 10.1109/CVPR.2014.81.
[9] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. SSD: Single Shot MultiBox Detector. In European conference on computer vision, pages 21–37. Springer, 2016.
[10] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 779-788, doi: 10.1109/CVPR.2016.91.
[11] B. Xiao, J. Guo, and Z. He, "Real-Time Object Detection Algorithm of Autonomous Vehicles Based on Improved YOLOv5s," 2021 5th CAA International Conference on Vehicular Control and Intelligence (CVCI), Tianjin, China, 2021, pp. 1-6, doi: 10.1109/CVCI54083.2021.9661149.
[12] P. Zhang, W. Hou, D. Wu, B. Ge, L. Zhang, and H. Li, "Real-Time Detection of Small Targets for Video Surveillance Based on MS-YOLOv5," 2023 6th International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, 2023, pp. 690-694, doi: 10.1109/ICAIBD57115.2023.10206275.
[13] Y. Yang, "Drone-View Object Detection Based on the Improved YOLOv5," 2022 IEEE International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA), Changchun, China, 2022, pp. 612-617, doi: 10.1109/EEBDA53927.2022.9744741.
[14] Wang, Chien-Yao, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors." In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7464-7475. 2023.
[15] S. Chourasia, R. Bhojane and L. Heda, "Safety Helmet Detection: A Comparative Analysis Using YOLOv4, YOLOv5, and YOLOv7," 2023 International Conference for Advancement in Technology (ICONAT), Goa, India, 2023, pp. 1-8, doi: 10.1109/ICONAT57137.2023.10080723.
[16] T. Reddy Konala, A. Nammi and D. Sree Tella, "Analysis of Live Video Object Detection using YOLOv5 and YOLOv7," 2023 4th International Conference for Emerging Technology (INCET), Belgaum, India, 2023, pp. 1-6, doi: 10.1109/INCET57972.2023.10169926.
[17] I. Hilali, A. Alfazi, N. Arfaoui and R. Ejbali, "Tourist Mobility Patterns: Faster R-CNN Versus YOLOv7 for Places of Interest Detection," in IEEE Access, vol. 11, pp. 130144-130154, 2023, doi: 10.1109/ACCESS.2023.3334633.
[18] M. Sarkar, S. Biswas and B. Ganguly, "A Hybrid Transfer Learning Architecture Based Image Captioning Model for Assisting Visually Impaired," 2023 IEEE 3rd Applied Signal Processing Conference (ASPCON), India, 2023, pp. 211-215, doi: 10.1109/ASPCON59071.2023.10396262.
[19] A. S. Alva, R. Nayana, N. Raza, G. S. Sampatrao and K. B. S. Reddy, "Object Detection and Video Analyser for the Visually Impaired," 2023 Third International Conference on Artificial Intelligence and Smart Energy (ICAIS), Coimbatore, India, 2023, pp. 1405-1412, doi: 10.1109/ICAIS56108.2023.10073662.
[20] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[21] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Compute. 9 (1997) 1735–1780.
[22] A. Bodi, P. Fazli, S. Ihorn, Y. Siu, A. Scott, L. Narins, Y. Kant, A. Das, and I. Yoon. 2021. Automated Video Description for Blind and Low Vision Users. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–7. https://doi.org/10.1145/3411763.3451810.
[23] Y. -H. Huang and Y. -Z. Hsieh, "The Assisted Environment Information for Blind based on Video Captioning Method," 2020 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan), Taoyuan,
[24] Taiwan, 2020, pp. 1-2, doi: 10.1109/ICCE-Taiwan49838.2020.9258088.
[25] D. Chen and W. Dolan, “Collecting highly parallel data for paraphrase evaluation”. In ACL: Human Language Technologies- Volume 1. ACL, 190-200, 2011.

Downloads
Published
How to Cite
License
Copyright (c) 2025 Ghazwan Jabbar Ahmed, Farah Hatem Khorsheed

This work is licensed under a Creative Commons Attribution 4.0 International License.