tesserocr icon indicating copy to clipboard operation
tesserocr copied to clipboard

How to get result similar to Pytesseract's image_to_data

Open ichenjia opened this issue 4 years ago • 3 comments

Hi,

i need the detailed TSV data as output including the following info:

  1. Text
  2. width, height, x, y of each word
  3. Confidence of each word
  4. Level of each word

Historically, I have been using Pytesseract's image_to_data. Does this library have a similar API?

Thanks!

ichenjia avatar Aug 25 '21 06:08 ichenjia

I was wondering the exact same. Specifically, I noticed that the read.me states utilizing threading with calls to image_to_text and file_to_text is very quick. For my purpose, I want to utilize the quickest approach with calling tesserocr, however, no other such helper functions were flagged as appropriate to use with threading. I saw that the class PyTessBaseAPI contains all necessary methods to get the confidence, bounding box, level, etc. of an identified text, however, I'm not sure how this would be incorporated into threading. Is there any advice on how to approach creating a thread-safe function for an image_to_data?

parulsingh23 avatar Aug 26 '21 03:08 parulsingh23

Did you mean https://github.com/sirfz/tesserocr#getcomponentimages-example + iterate over RIL?

zdenop avatar Aug 26 '21 07:08 zdenop

That’s a good point. I am guessing we can use the iterator and set the level to WORD. then retrieve the rect and confidence.

On Aug 26, 2021, at 12:01 AM, zdenop @.***> wrote:

Did you mean https://github.com/sirfz/tesserocr#getcomponentimages-example https://github.com/sirfz/tesserocr#getcomponentimages-example + iterate over RIL?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sirfz/tesserocr/issues/267#issuecomment-906147680, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4MCG64FP3BRENA6AOEUVDT6XRDHANCNFSM5CYLQVAA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email.

ichenjia avatar Aug 26 '21 07:08 ichenjia

The code for this was already provided in another issue by stefan6419846: https://github.com/sirfz/tesserocr/issues/300#issuecomment-1143322864

rsnitsch avatar Sep 08 '23 13:09 rsnitsch