Results 384 comments of Jerome Flesch

1) Have you tried running tesseract from the command line directly ? Do you get the same result ? 2) You can try replacing `tool.image_to_string()` by `pyocr.libtesseract.image_to_string()`. You will probably...

You can simply create a builder object yourself. You can have a look at [https://github.com/jflesch/pyocr/blob/master/src/pyocr/tesseract.py#L57](CharBoxBuilder) for an example. Basically you just need to inherit from ```BaseBuilder``` and define ```tess_conf =...

Assuming you're using Tesseract (`pyocr.tesseract`) and not (`pyocr.libtesseract`) then yes, you can. You can make your own builder. See [DigitBuilder](https://github.com/openpaperwork/pyocr/blob/master/src/pyocr/builders.py#L348) and the [other builders](https://github.com/openpaperwork/pyocr/blob/master/src/pyocr/builders.py) for reference. My suggestion: Inherit from...

I observed a similar problems when making the tests. The output of libtesseract and tesseract (sh) are slightly different. Quite frankly I'm not sure where this difference comes from. Currently,...

Good point. It might explain the differences. I'll have a look later.

You mean one confidence score for the OCR on the whole image ? I'm not even sure whether Tesseract provides such score.

Ok, good to know. For the words, I guess it can be added as an attribute to ```pyocr.builders.Box``` objects. Regarding the whole, with the current API, it's going to be...

Per words, you can say thanks to @a-pagano : https://github.com/openpaperwork/pyocr/pull/86 :-)

Sorry, I meant to keep this ticket opened regarding the confidence score for the whole page.

Changes of @a-pagano have been released in Pyocr 0.5