core icon indicating copy to clipboard operation
core copied to clipboard

workspace validator: allow files not representing a single page

Open bertsky opened this issue 5 years ago • 2 comments
trafficstars

Currently, the workspace validator will complain about any kind of file (including derived images) that is not contained in the structMap as physical page.

IMO this is an error on the side of the validator: Only files representative of a complete page (like derived images processed on the page level, but not on the region, line or word level) should have a pageId.

bertsky avatar May 18 '20 06:05 bertsky

Update:

Currently, the workspace validator will complain about any kind of file (including derived images) that is not contained in the structMap as physical page.

That's still the case (even after OCR-D/spec#151 and OCR-D/spec#164), error reported is does not manifest any physical page.

IMO this is an error on the side of the validator: Only files representative of a complete page (like derived images processed on the page level, but not on the region, line or word level) should have a pageId.

Above spec changes clarified that derived images are always to be attributed to a physical pages, but we do have the valid case of global files (representative of the whole document) now.

bertsky avatar Sep 16 '20 19:09 bertsky

Example of this bug:

<error>File 'FULLDOWNLOAD' does not manifest any physical page.</error>

Can we at least get a skip flag for this, @kba?

bertsky avatar Sep 27 '22 21:09 bertsky