ocrd_anybaseocr
ocrd_anybaseocr copied to clipboard
Stricter cropping
A DFG requirement when scanning is to show a part of the opposite page. On some pages this tends to be a problem, since anybaseocr-crop does not crop the text and later tools detect text/characters where they shouldn't.
Here are two examples.

What would be a strategy to tackle this?
AFAICT this processor tries to avoid textual noise via separator line detection. There are a couple of (crappy and badly documented) parameters for this (rular...), but IMHO your best shot here would be trying to increase the contrast so the binarized image shows a distinct, contiguous vertical line where the gutter/spine is.
Besides binarization settings, there is a second workflow detail that might help: If you deskew before cropping, these lines should be easier to detect.
@beckstefan is this gone with the reimplementation of the cropper?
(If you could post or link to the originals, I could run it...)