surya
surya copied to clipboard
Urdu Text Does Not Get Detected
First things first, sincere appreciation for your outstanding work in developing this incredible AI-driven OCR library. It's a fantastic tool that holds immense potential for digital humanities, I am a student of this subject.
I started my testing with some old Urdu historical documents, and unfortunately, I didn't observe any bounding box (Bbox) detection for the Urdu text within those documents.
Subsequently, I tested it with an image that contains a mix of Hindi, English, and Urdu text. To my delight, it successfully detected the Hindi and English portions of the text. However, it only recognized one line of the Urdu text, which was less than expected. I have attached the image for your reference so that you can better understand the scenario.
Try the new code/model - pip install -U surya
This seems to work
and
You may need to experiment with the threshold settings to detect more text (see README)
Noted with thanks