core OcrdMets: add generateDS model of MODS as new OcrdMods class

For processors consuming MODS metadata, it would help (as in: easier and more efficient code) being able to use the Python object model. For example, querying language or script by XPath is painful.

The interface could be something like ocrd_mets.OcrdMets.dmdSec (as a dict of IDs to ocrd_mods.OcrdMods instances).

Remotely related: #783

Oct 24 '22 14:10 bertsky

@bertsky in https://github.com/OCR-D/core/pull/966#pullrequestreview-1261544355 (posting here so does not get lost when resolving that discussion):

Moreover, what about MODS queries? ATM it's only a minor use-case (ocrd-segment-extract-lines wants to know the mods:recordIdentifier). But IIUC this will be the only way processors can query meta-data (whether passed from manual input or previous processors). So IMO we must (at some point, not necessarily right now) provide some OcrdMods and wrap that object via HTTP as well, e.g. in OcrdMets:
@property
def mods(self):
    return parsexml(...)
and then wrapping a /mods entry point in OcrdMetsServer and then in ClientSideOcrdMets:
@property
def mods(self):
    r = self.session.request('GET', f'{self.url}/mods')
    return r.json()

Aug 17 '23 18:08 kba

Yes, and an OcrdMods would also be needed if we were to extend #698 (automatic inheritance in OcrdPage hierarchy) with the document-wide lang/script features.

Aug 18 '23 10:08 bertsky

Yes, and an OcrdMods would also be needed if we were to extend #698 (automatic inheritance in OcrdPage hierarchy) with the document-wide lang/script features.

However, this could also be achieved via a dedicated (specialised) processor (which merely fills page-level lang/script from the MODS)...

Dec 05 '23 15:12 bertsky

Valuable functionality that could be reused for OcrdMods can also be found in:

Dec 05 '23 16:12 bertsky