tesserocr line_num with api.GetComponentImages(RIL.WORD, True)

line_num with api.GetComponentImages(RIL.WORD, True)

Open kbrajwani opened this issue 5 years ago • 1 comments

@sirfz Hey i want to get line_num as per pytesseract.image_to_data. i am using api.GetComponentImages(RIL.WORD, True) witch is giving me words coordinate, block id, paragraph id. same as block id i want line id. can you please tell me how can i achieve this.

Oct 12 '20 10:10 kbrajwani

Just use RIL.TEXTLINE instead of RIL.WORD and use enumerate for counting.

If you want both the textline and the word images, then I recommend using the page/result iterator directly (for which GetComponentImages is just a wrapper). That is, api.GetIterator() and then it.Next() / it.Empty() / it.IsAtFinalElement(ril) etc. Or the Pythonic iterate_level generator. From the iterator, you can get BoundingBox(ril) as well as GetImage(ril), and of course the text and such.

Jul 02 '21 18:07 bertsky

tesserocr tesserocr copied to clipboard

line_num with api.GetComponentImages(RIL.WORD, True)

tesserocr
tesserocr copied to clipboard