tesserocr
tesserocr copied to clipboard
How to get result similar to Pytesseract's image_to_data
Hi,
i need the detailed TSV data as output including the following info:
- Text
- width, height, x, y of each word
- Confidence of each word
- Level of each word
Historically, I have been using Pytesseract's image_to_data. Does this library have a similar API?
Thanks!
I was wondering the exact same. Specifically, I noticed that the read.me states utilizing threading with calls to image_to_text and file_to_text is very quick. For my purpose, I want to utilize the quickest approach with calling tesserocr, however, no other such helper functions were flagged as appropriate to use with threading. I saw that the class PyTessBaseAPI contains all necessary methods to get the confidence, bounding box, level, etc. of an identified text, however, I'm not sure how this would be incorporated into threading.
Is there any advice on how to approach creating a thread-safe function for an image_to_data?
Did you mean https://github.com/sirfz/tesserocr#getcomponentimages-example + iterate over RIL?
That’s a good point. I am guessing we can use the iterator and set the level to WORD. then retrieve the rect and confidence.
On Aug 26, 2021, at 12:01 AM, zdenop @.***> wrote:
Did you mean https://github.com/sirfz/tesserocr#getcomponentimages-example https://github.com/sirfz/tesserocr#getcomponentimages-example + iterate over RIL?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sirfz/tesserocr/issues/267#issuecomment-906147680, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4MCG64FP3BRENA6AOEUVDT6XRDHANCNFSM5CYLQVAA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.
The code for this was already provided in another issue by stefan6419846: https://github.com/sirfz/tesserocr/issues/300#issuecomment-1143322864