pyocr
pyocr copied to clipboard
Cuneiform: Split text areas before OCR
Cuneiform tends to stop reading pages when it reachs a large non-readable area. Because of this, when using Cuneiform, all the keywords are not actually extracted.
A way to work around this problem would be to split the text areas prior to OCR.
For instance, unpaper can do that (ocrfeeder uses it).
Note : Image processing algorithm should be added in https://github.com/jflesch/libpillowfight/ , not in PyOCR directly.