pyocr
pyocr copied to clipboard
A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab
For some reason there are differences between the references and the actual results. And it seem the actual results are good, so it's problably a bug in ```update_test_data.sh```
Is it possible to get a confidence score for the predictions (not orientation) ?
I see the code of https://github.com/openpaperwork/pyocr/blob/master/src/pyocr/libtesseract/tesseract_raw.py#L381 https://github.com/openpaperwork/pyocr/blob/master/src/pyocr/libtesseract/__init__.py#L110 It's showed the libtesseract was supported the numeric mode,but why in `get_available_builders` not allow us use `DigitBuilder`?
Hi Team, Currently i am using pyocr with tesseract 3.05.01. I am using pyocr.get_available_tools() to get tesseract. Is there any way i can preserve_interword_spaces for tesseract with help of pyocr.
--load_system_dawg 0 would be helpful as an argument in image_to_text, perhaps as an options dictionary. Feel free to call it something that makes it language agnostic
Someone has been reporting crashes of Paperwork when running the OCR. They are using Tesseract 3.04.01 .. so there may be something wrong with the libtesseract binding. (Note: currently, the...
Cuneiform tends to stop reading pages when it reachs a large non-readable area. Because of this, when using Cuneiform, all the keywords are not actually extracted. A way to work...
Even when using the LineBoxBuilder, it seems too much data is stripped from the hOCR files.