Display document page metadata
ALTO files contains meta information like this:
<OCRProcessing ID="IdOcr">
<ocrProcessingStep>
<processingDateTime>2014-05-21</processingDateTime>
<processingSoftware>
<softwareCreator>ABBYY</softwareCreator>
<softwareName>ABBYY FineReader Engine</softwareName>
<softwareVersion>11</softwareVersion>
</processingSoftware>
</ocrProcessingStep>
</OCRProcessing>
The report should display it.
This would be very useful!
Unfortunately it will only work for ALTO though, since for PAGE-XML there is no such provenance but one rather has to fallback on the METS container instead.
Also note that the <OCRProcessing> structure has been changed to <Processing> and heavily modified as of ALTO version 4.0.
For PAGE files:
<pc:Metadata>
<pc:Creator>OCR-D/core 2.17.0</pc:Creator>
<pc:Created>2020-10-02T09:13:28</pc:Created>
<pc:LastChange>2020-10-02T09:13:28</pc:LastChange>
<pc:MetadataItem type="processingStep" name="preprocessing/optimization/binarization" value="ocrd-olena-binarize">
<pc:Labels>
<pc:Label value="sauvola-ms-split" type="impl"/>
<pc:Label value="0.34" type="k"/>
<pc:Label value="0" type="win-size"/>
<pc:Label value="0" type="dpi"/>
</pc:Labels>
</pc:MetadataItem>
<pc:MetadataItem type="processingStep" name="layout/segmentation/region" value="ocrd-sbb-textline-detector">
<pc:Labels externalModel="ocrd-tool" externalId="parameters">
<pc:Label value="/var/lib/textline_detection" type="model"/>
</pc:Labels>
</pc:MetadataItem>
<pc:MetadataItem type="processingStep" name="recognition/text-recognition" value="ocrd-calamari-recognize">
<pc:Labels externalModel="ocrd-tool" externalId="parameters">
<pc:Label value="/var/lib/calamari-models/GT4HistOCR/2019-07-22T15_49+0200/*.ckpt.json" type="checkpoint"/>
<pc:Label value="glyph" type="textequiv_level"/>
<pc:Label value="confidence_voter_default_ctc" type="voter"/>
<pc:Label value="0.001" type="glyph_conf_cutoff"/>
</pc:Labels>
</pc:MetadataItem>
</pc:Metadata>
But note that only PAGE files produced by OCR-D include this information - I am not aware of any other tool producing PAGE output currently populating this section in this way.
Yeah, if it's not there it will not be displayed.