Cuneiform: Split text areas before OCR

Open jflesch opened this issue 13 years ago • 1 comments

Cuneiform tends to stop reading pages when it reachs a large non-readable area. Because of this, when using Cuneiform, all the keywords are not actually extracted.

A way to work around this problem would be to split the text areas prior to OCR.

For instance, unpaper can do that (ocrfeeder uses it).

Apr 25 '12 20:04 jflesch

Note : Image processing algorithm should be added in https://github.com/jflesch/libpillowfight/ , not in PyOCR directly.

Oct 26 '16 15:10 jflesch