iis
iis copied to clipboard
Provide a specific version in provenace for Grobid
TEI record produced by Grobid includes, apart from the publication metadata, also the version of Grobid responsible for creation of a given TEI XML record:
<encodingDesc>
<appInfo>
<application version="0.8.2-SNAPSHOT" ident="GROBID" when="2025-11-05T16:18+0000">
<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
<ref target="https://github.com/kermitt2/grobid"/>
</application>
</appInfo>
</encodingDesc>
which is part of the <teiHeader> element.
This is quite convenient and allows us replacing currently defined rather generic provenance value (set to grobid) with more specific version while relying on xPath: //tei:teiHeader/tei:encodingDesc/tei:appInfo/tei:application/@version. We could rely on the @ident attribute to get the root GROBID name instead of hardcoding it.