spec
spec copied to clipboard
Specification of the @OCR-D technical architecture, interface definitions and data exchange format(s)
Currently, we only specify how to describe the hierarchy of pages (represented by a set of files under `mets:structMap/mets:div/mets:div`) and their order. But nothing so far on logical structure **across...
It would be very useful to introduce a new mets:fileGrp especially for ground truth datasets. This new group includes both Region and Line level segmentations. Suggested name: ```xml ```
While checking https://github.com/OCR-D/core/pull/1066, I noticed that we have the rule in the validator but AFAICT not in the specs that the `@pcGtsId` of a PAGE document should be the same...
We briefly talked about those in the Tech Call today and decided to make these part of the spec, hence this PR. I took the liberty of updating the list...
instead of https://github.com/OCR-D/ocrd-website/pull/354
The specification currently makes no suggestion on how to deal with more than one consecutive white space character.
From my and @bertsky's discussion at https://github.com/qurator-spk/eynollah/issues/67: >> Yes, it should be possible to skip pages marked as certain types in the logical structmap – not just in any one...
According to this [discussion on the Processing Server implementation](https://github.com/OCR-D/core/pull/974#discussion_r1138901846), we should simplify here. (But it must be clear at all times what is a workflow job ID and what is...