GSSoC'24: OCR Detection
Resolves #56
The pull request for the OCR Detection resolving the feature enhancement.
OCR_Detection
The OCR Detection is introduced in order to help the victims in the situation where they can not use other means to communicate or seek help, other than written communication shown to the camera. Also, it helps in detecting potential self-harm when the victim is in the process of writing the death note, and the camera catches a glimpse of it and can use existing models to determine the scale of the threat that uses text as their primary input.
Usage
It is to be kept in mind that a window will only be created if there are text detected by the model. For visualizing another image, that window has to be closed in order for the another window to appear.
demo.pyto start the web camera for obtaining frames.- Ctrl + C to exit from the script.
- If a text is detected, a new window opens with the text detected, annotations and the confidence. The detected text is also printed for convenience.
Working
easyocr package is used to provide image to text detection. Model_Data contains the downloaded model to reduce the online dependancy.
detect.pycontains the functions that can be imported by other scripts to be executed to perform image to text detection.demo.pycontains a demo code which showcases the functionality.
OpenCV without GUI (opencv-python-headless) is used to optimize the script for detection. It is useful in optimizing the detection speed by removing useless processes used for GUI.
Additionally, for web integration, GUI is not needed but the other functionalities remains the same.
demo.py also contains an optimization which prevents the execution of the model detection if the frame difference is less, i.e., the frames hasn't changed much. MSE (Mean Squared Error) is used to calculate the difference between the two frame. The model only gets executed, if the error is greater than 20. This can be modified by changing the value of ERR_DIFF.
Multi-processing can be used to get seamless detections without delay.
Demo
- The image is purely for the demonstration purposes. The red text shows the detected text (the detected text is large so it is cutting from the screen). The bounding boxes, the green text above it and the confidence is for visualization purposes.
- The more the resolution of the camera, the better the results.
@TAHIR0110 Please review the pull request, and let me know if changes are required or not.
@TAHIR0110 Please also add the labels in the PR, the same that is mentioned in the issue.
I would like to work on this, can you please assign this to me?
@SAM-DEV007 I have merged it and labelled it as level3 instead of level1.