iis icon indicating copy to clipboard operation
iis copied to clipboard

Provide a specific version in provenace for Grobid

Open marekhorst opened this issue 1 month ago • 0 comments

TEI record produced by Grobid includes, apart from the publication metadata, also the version of Grobid responsible for creation of a given TEI XML record:

<encodingDesc>
            <appInfo>
                <application version="0.8.2-SNAPSHOT" ident="GROBID" when="2025-11-05T16:18+0000">
                    <desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
                    <ref target="https://github.com/kermitt2/grobid"/>
                </application>
            </appInfo>
        </encodingDesc>

which is part of the <teiHeader> element.

This is quite convenient and allows us replacing currently defined rather generic provenance value (set to grobid) with more specific version while relying on xPath: //tei:teiHeader/tei:encodingDesc/tei:appInfo/tei:application/@version. We could rely on the @ident attribute to get the root GROBID name instead of hardcoding it.

marekhorst avatar Nov 05 '25 16:11 marekhorst