pyocr icon indicating copy to clipboard operation
pyocr copied to clipboard

Cuneiform: Split text areas before OCR

Open jflesch opened this issue 13 years ago • 1 comments

Cuneiform tends to stop reading pages when it reachs a large non-readable area. Because of this, when using Cuneiform, all the keywords are not actually extracted.

A way to work around this problem would be to split the text areas prior to OCR.

For instance, unpaper can do that (ocrfeeder uses it).

jflesch avatar Apr 25 '12 20:04 jflesch

Note : Image processing algorithm should be added in https://github.com/jflesch/libpillowfight/ , not in PyOCR directly.

jflesch avatar Oct 26 '16 15:10 jflesch