core
core copied to clipboard
ocrd workspace rename-group: update file refs in ALTO, too
The current implementation of Workspace.rename_file_group is smart by going after the affected image file references within PAGE files as well:
https://github.com/OCR-D/core/blob/71d295ac1fccbeb4164e230bd584e1920b9ab3c8/ocrd/ocrd/workspace.py#L324-L342
It would be even better if ALTO files (i.e. /alto/Description/sourceImageInformation/fileName) were updated in a similar fashion.
Also, I think it would be useful to add an option for not moving any local files around at all, including ID changes. (In that case, no references need to be updated. And it is much faster.)
Another option would be to offer just making the new group an alias of the old one (as implemented via XSLT 1.0 in workflow-configuration).
Another option would be to offer just making the new group an alias of the old one (as implemented via XSLT 1.0 in workflow-configuration).
@kba should we make that a separate issue? (Use-cases are aliasing input fileGrp to OCR-D-IMG for our common workflows, or aliasing output fileGrp FULLTEXT to ALTO for myCore.)
Another option would be to offer just making the new group an alias of the old one (as implemented via XSLT 1.0 in workflow-configuration).
Ouch, just noticed that mets-alias-filegrp.xsl is fundamentally broken, for it is not allowed to reuse the same XML IDs – I would have to rename them in the new fileGrp (and re-reference them in the physical structmap). Since this kind of thing cannot easily be done in XSL (v1.0 anyway), let's please provide that via Python.