Konstantin Baierer
Konstantin Baierer
While checking https://github.com/OCR-D/core/pull/1066, I noticed that we have the rule in the validator but AFAICT not in the specs that the `@pcGtsId` of a PAGE document should be the same...
We briefly talked about those in the Tech Call today and decided to make these part of the spec, hence this PR. I took the liberty of updating the list...
More references you might want to include: - [Leifert & Labahn 2019: End-to-End Measure for Text Recognition](https://ieeexplore.ieee.org/abstract/document/8978155) (on CER and derived metrics, analysis of reading order, segmentation and geometry influences)...
_Originally posted by @bertsky in https://github.com/OCR-D/spec/pull/225#discussion_r1086173671_ > Speaking of: IMHO it would be quite relevant to offer a CER metric under level-2 (or even level-1) equivalency. Not exclusively (because this...
During debugging bertsky/ocrd_detectron2#14 I realized that my assumption that every archive would only contain a single resource was wrong. The detectron2 models consist of a pytorch NN and a YAML...
We forgot to formally specify the behavior here, cf. OCR-D/core#929
The meta-documenations is a bit meagre at the moment, adding (better) descriptions to all fields will help implementers write more expressive ocrd-tool.json. https://github.com/OCR-D/spec/pull/121#discussion_r309177809 ff.
We need to specify how these constructs are related, which one to use, how to handle contradictions.
Basis for #134 and OCR-D/core#376 Will require a new major version and some seriously frantic pull requesting to all the processors but it's worth it for sustainability IMHO.