core
core copied to clipboard
ocrd workspace validate: strange behaviour with symbolic links
If you have an workspace, which has symbolic links in Image Source Folder (OCR-D-IMG), then it looks like that the command
ocrd workspace validate
creates an copy for this image additional to the symbolic link.
This is only the case, if the symbolic link has an absolute path (not a relative one).
Before validate:

After validate:

Example workspace: reproduceStrangeValidate.zip
Hint: Maybe this is related to "workspace add" and not "validate" --> I am not that sure anymore ...
I suspect that during validation, data is persisted at the wrong place. We had that problem before with spurious TEMP folders inside workspaces after validation. But I haven't yet investigated, will update when I do.
I think the problem is in workspace.py on line 146
url_path = Path(f.url).resolve()
If the file url is with symbolic links, resolve will return the real path and this is usually not in the workspace directory.
any chance to replace resolve with absolute?
url_path = Path(f.url).absolute()
It looks like, that workspace add does NOT create a copy of the (image) file in addition to the link anymore (using version 2.32).
But now I have the same problem using ocrd-cis-ocropy-binarize.
--> please, clarify
Thanks @stefanCCS for the report and @mexthecat for the analysis!
IIUC resolve() must indeed be replaced by absolute, and that this causes not only the validator but also all processors to behave the same way.
@kba maybe we should also cover this in the tests by using symlinks somewhere in assets.
Hi @kba is there a fix available for this topic (I want to start another project, which includes some data to copy (or better just to link) ...)?
I've created a pull request
https://github.com/OCR-D/core/pull/954
Maybe I made it to easy - I use this patch for quit some time - couldn't see any strange behaviour.
Fixed in the v2.42.0. Thanks @mexthecat!
@kba: Just tested out - works fine - many thanks :-)