Robert Sachunsky
Robert Sachunsky
IMO we are still lacking a convention to represent illegible substrings. DTABf (TEI) uses [gap](https://deutschestextarchiv.de/doku/basisformat/gapSupplied.html) for this. Since there is a dependency from GT to OCR training to OCR inference...
When we had the [original discussions about a new workflow format](https://github.com/OCR-D/spec/pull/208) to replace the de-facto standard `ocrd process` syntax in the core implementation, there was a general understanding that the...
Since #206 we can moderately restrict an ocrd-tool `parameter` of type `object`'s `properties` and even set `additionalProperties`: https://github.com/OCR-D/spec/blob/506b33936d89080a683fa8a26837f2a23b23e5e2/ocrd_tool.schema.yml#L95-L100 However, according to [the JSON data schema](https://json-schema.org/understanding-json-schema/reference/object.html), `additionalProperties` is either `false` or...
This is somewhat already part of #116 but I would like to see a discussion for the specific problem that dewarping poses to the coordinate reproducibility principle. Now that we...
We all know we need some form of [quality estimates to control where computation is spent and what workflow steps are used](https://www.dfg.de/download/pdf/foerderung/programme/lis/absichtserklaerungen_ocrd_2020/leipzig_dresden.pdf). ## External quality control One might consider this...
The spec should be more specific about how `AlternativeImage` must be used. There are issues of _coordinate reproducibility_ and _disambiguation_, and we need another `@comments` class `rescaled`. See [here](https://github.com/OCR-D/ocrd_tesserocr/issues/33) for...
IMO there is a large, still unmet demand in OCR-D for image preprocessing tools to 1. color-normalize raw images (i.e. linear or non-linear contrast stretching, gamma correction) 2. denoise raw...
The current OCR-D spec has a completely flat hierarchy of PAGE-XML segments. However, there is a large demand for at least mildly recursive regions for: 1. paragraphs inside text regions...
In mets.md, the following is stated ever since the very first version: > Every processing step that generates new images and changes their dimensions MUST make sure to adapt the...