core
core copied to clipboard
workspace validator: allow files not representing a single page
Currently, the workspace validator will complain about any kind of file (including derived images) that is not contained in the structMap as physical page.
IMO this is an error on the side of the validator: Only files representative of a complete page (like derived images processed on the page level, but not on the region, line or word level) should have a pageId.
Update:
Currently, the workspace validator will complain about any kind of file (including derived images) that is not contained in the structMap as physical page.
That's still the case (even after OCR-D/spec#151 and OCR-D/spec#164), error reported is does not manifest any physical page.
IMO this is an error on the side of the validator: Only files representative of a complete page (like derived images processed on the page level, but not on the region, line or word level) should have a pageId.
Above spec changes clarified that derived images are always to be attributed to a physical pages, but we do have the valid case of global files (representative of the whole document) now.
Example of this bug:
<error>File 'FULLDOWNLOAD' does not manifest any physical page.</error>
Can we at least get a skip flag for this, @kba?