core icon indicating copy to clipboard operation
core copied to clipboard

ocrd workspace rename-group: update file refs in ALTO, too

Open bertsky opened this issue 3 years ago • 3 comments

The current implementation of Workspace.rename_file_group is smart by going after the affected image file references within PAGE files as well:

https://github.com/OCR-D/core/blob/71d295ac1fccbeb4164e230bd584e1920b9ab3c8/ocrd/ocrd/workspace.py#L324-L342

It would be even better if ALTO files (i.e. /alto/Description/sourceImageInformation/fileName) were updated in a similar fashion.

bertsky avatar Sep 27 '22 21:09 bertsky

Also, I think it would be useful to add an option for not moving any local files around at all, including ID changes. (In that case, no references need to be updated. And it is much faster.)

Another option would be to offer just making the new group an alias of the old one (as implemented via XSLT 1.0 in workflow-configuration).

bertsky avatar Sep 28 '22 15:09 bertsky

Another option would be to offer just making the new group an alias of the old one (as implemented via XSLT 1.0 in workflow-configuration).

@kba should we make that a separate issue? (Use-cases are aliasing input fileGrp to OCR-D-IMG for our common workflows, or aliasing output fileGrp FULLTEXT to ALTO for myCore.)

bertsky avatar Dec 11 '23 16:12 bertsky

Another option would be to offer just making the new group an alias of the old one (as implemented via XSLT 1.0 in workflow-configuration).

Ouch, just noticed that mets-alias-filegrp.xsl is fundamentally broken, for it is not allowed to reuse the same XML IDs – I would have to rename them in the new fileGrp (and re-reference them in the physical structmap). Since this kind of thing cannot easily be done in XSL (v1.0 anyway), let's please provide that via Python.

bertsky avatar Dec 14 '23 13:12 bertsky