Python 3 port may have broken ocr.py
#3's cb0530f85b2860e0df6a1244dc3a11f98264c0f6 and/or #4's 356e77f7af312296bd612d154dd0a465c1167cb7 have changed tesseract to pytesseract but I don't think they are the same thing (or it was a really old version) since I'm getting this error:
Traceback (most recent call last):
File "ocr.py", line 224, in <module>
main()
File "ocr.py", line 219, in main
blurbs = ocr_on_bounding_boxes(binary, components)
File "ocr.py", line 134, in ocr_on_bounding_boxes
api = pytesseract.TessBaseAPI()
AttributeError: module 'pytesseract' has no attribute 'TessBaseAPI'
The ocr.py code may need to be rewritten to work with (latest) pytesseract.
https://github.com/madmaze/pytesseract
I'm on Python 3.8.
Thank for reporting this. I really haven't been using or maintaining this repo in years. I might take a stab at fixing this over the weekend, but I don't have too much interest in maintaining Python which is plagued with versioning issues. Still, I'm a little happy someone somewhere is playing with this code.