pyocr issues

Tests/Tesseract: Differences between Pyocr and reference output

5

For some reason there are differences between the references and the actual results. And it seem the actual results are good, so it's problably a bug in ```update_test_data.sh```

jflesch

bug

to study

Confidence score

6

Is it possible to get a confidence score for the predictions (not orientation) ?

MathieuCliche

feature request

[libtesseract] output of get_available_builders() is incomplete

2

I see the code of https://github.com/openpaperwork/pyocr/blob/master/src/pyocr/libtesseract/tesseract_raw.py#L381 https://github.com/openpaperwork/pyocr/blob/master/src/pyocr/libtesseract/__init__.py#L110 It's showed the libtesseract was supported the numeric mode，but why in `get_available_builders` not allow us use `DigitBuilder`？

wwqgtxx

bug

preserve_interword_spaces in tesseract

1

Hi Team, Currently i am using pyocr with tesseract 3.05.01. I am using pyocr.get_available_tools() to get tesseract. Is there any way i can preserve_interword_spaces for tesseract with help of pyocr.

anilnaik1988

support

Expose load_system_dawg for non textual ouput

3

--load_system_dawg 0 would be helpful as an argument in image_to_text, perhaps as an options dictionary. Feel free to call it something that makes it language agnostic

awiebe

feature request

Libtesseract: need stress-testing

7

Someone has been reporting crashes of Paperwork when running the OCR. They are using Tesseract 3.04.01 .. so there may be something wrong with the libtesseract binding. (Note: currently, the...

jflesch

bug

to study

Cuneiform: Split text areas before OCR

1

Cuneiform tends to stop reading pages when it reachs a large non-readable area. Because of this, when using Cuneiform, all the keywords are not actually extracted. A way to work...

jflesch

feature request

to study

hOCR : too much data is stripped

Even when using the LineBoxBuilder, it seems too much data is stripped from the hOCR files.

jflesch

feature request

pyocr
pyocr copied to clipboard

Metadata

Tests/Tesseract: Differences between Pyocr and reference output

Confidence score

[libtesseract] output of get_available_builders() is incomplete

preserve_interword_spaces in tesseract

Expose load_system_dawg for non textual ouput

Libtesseract: need stress-testing

Cuneiform: Split text areas before OCR

hOCR : too much data is stripped

← Metadata

Owner

Metadata

pyocr pyocr copied to clipboard

Metadata

← Metadata

Owner

Metadata

pyocr
pyocr copied to clipboard