core icon indicating copy to clipboard operation
core copied to clipboard

RFC: A standard --validate option

Open mikegerber opened this issue 5 years ago • 2 comments
trafficstars

Another idea that came up in https://github.com/OCR-D/ocrd_olena/issues/60: I routinely run validation after running each processor to catch problems early. If there was a standard option --validate in core (supplemented by a config file that configures e.g. --skip options), this pattern:

ocrd-olena-binarize --overwrite -I $INPUT_FILE_GRP -O OCR-D-IMG-BINPAGE,OCR-D-IMG-BIN -P impl sauvola-ms-split
ocrd workspace validate $validate_options
                                                                                                                                                              
ocrd-sbb-textline-detector --overwrite -I OCR-D-IMG-BINPAGE -O OCR-D-SEG-LINE -P model /var/lib/textline_detection
ocrd workspace validate $validate_options

would simplify to:

ocrd-olena-binarize --validate --overwrite -I $INPUT_FILE_GRP -O OCR-D-IMG-BINPAGE,OCR-D-IMG-BIN -P impl sauvola-ms-split
ocrd-sbb-textline-detector --validate --overwrite -I OCR-D-IMG-BINPAGE -O OCR-D-SEG-LINE -P model /var/lib/textline_detection

I think some kind of configuration for this hypothetical option is absolutely required. For example, I use these options to make routine validation useful for me:

    --skip dimension
    --skip pixel_density
    --page-strictness lax
    --page-coordinate-consistency off

mikegerber avatar Jul 30 '20 18:07 mikegerber

See also #557 for ideas in regard to configuration.

mikegerber avatar Aug 07 '20 14:08 mikegerber

That's exactly what I have already proposed here a while ago:

Perhaps we should start adding other mechanisms that affect all processors equally (like the loglevel override): ... Or supporting automatic workspace validation with different levels/sets of checks.

bertsky avatar Aug 14 '20 21:08 bertsky