tesserocr
tesserocr copied to clipboard
line_num with api.GetComponentImages(RIL.WORD, True)
@sirfz Hey i want to get line_num as per pytesseract.image_to_data. i am using api.GetComponentImages(RIL.WORD, True) witch is giving me words coordinate, block id, paragraph id. same as block id i want line id. can you please tell me how can i achieve this.
Just use RIL.TEXTLINE instead of RIL.WORD and use enumerate for counting.
If you want both the textline and the word images, then I recommend using the page/result iterator directly (for which GetComponentImages is just a wrapper). That is, api.GetIterator() and then it.Next() / it.Empty() / it.IsAtFinalElement(ril) etc. Or the Pythonic iterate_level generator. From the iterator, you can get BoundingBox(ril) as well as GetImage(ril), and of course the text and such.