Robert Sachunsky
Robert Sachunsky
Regarding input_file_grp and output_file_grp: the latter has already been made non-required. However, that's not because we do not see much use in specifying fileGrp names statically anymore. But rather because...
Option 2 is problematic like this: the [current schema](https://github.com/PRImA-Research-Lab/PAGE-XML/blob/48845ea9fad4dc682bf6aa3905a2f3e6f886e326/pagecontent/schema/pagecontent.xsd#L76) does not allow arbitrary `type` strings – one would have to use `type="other" name="model-used" value="frm"` instead. And another variant of option...
Since https://github.com/OCR-D/core/pull/747 also the METS part is implemented in a general way (and should be standardized).
> * Currently, provenance information is stored in the `mets:agent/mets:note` elements, as well as in the ALTO XML `Processing/processingStepSettings` elements (kitodo_production_ocrd). The latter is true only because our page-to-alto converter...
> Possibly fixed by #154 superseded by #207, but unrelated AFAICS > For the main purposes of OCR-D we should avoid (modifying) the depths of METS/MODS library style structural tagging...
> Even thought they are related, I think we should separate the (more complex) issue of support for logical structMaps from document-wide files. Agreed. One at a time. > Currently,...
> > > > missing colour_checker. > > That's a custom type used at SBB, invented by @maria-federbusch. I was surprised to see it in [mets-mods2tei](https://github.com/slub/mets-mods2tei/blob/47f5bc283628438673cff5976b5af07b46790437/mets_mods2tei/api/tei.py#L842), but not in [kitodo.presentation](https://github.com/kitodo/kitodo-presentation/blob/9113a5e647b9142dde98e09a03a18d53d99bb077/Resources/Private/Language/NewTenant.xml)...
One might think of an additional CLI option, say `-G, --page-type`, matching `mets:structMap[@TYPE="LOGICAL"]//mets:div/@TYPE` of pages in that range of the `mets:structLink` (if any), perhaps even with `//`-prefixed regexes. But practically,...
@M3ssman on the OCR-D Forum you said that you have a workflow to do page selection based on logical structmap **externally** (independent of OCR-D) – could you elaborate here?
ok, so in principle it's clear that if you use the split recipe (dividing up the METS into single-page workspaces to be processed in parallel), then it is easy to...