core icon indicating copy to clipboard operation
core copied to clipboard

ocrd workspace validate: strange behaviour with symbolic links

Open stefanCCS opened this issue 3 years ago • 6 comments

If you have an workspace, which has symbolic links in Image Source Folder (OCR-D-IMG), then it looks like that the command ocrd workspace validate creates an copy for this image additional to the symbolic link. This is only the case, if the symbolic link has an absolute path (not a relative one).

Before validate: grafik

After validate: grafik

Example workspace: reproduceStrangeValidate.zip

stefanCCS avatar Feb 15 '22 09:02 stefanCCS

Hint: Maybe this is related to "workspace add" and not "validate" --> I am not that sure anymore ...

stefanCCS avatar Feb 15 '22 12:02 stefanCCS

I suspect that during validation, data is persisted at the wrong place. We had that problem before with spurious TEMP folders inside workspaces after validation. But I haven't yet investigated, will update when I do.

kba avatar Feb 15 '22 13:02 kba

I think the problem is in workspace.py on line 146

url_path = Path(f.url).resolve()

If the file url is with symbolic links, resolve will return the real path and this is usually not in the workspace directory.

any chance to replace resolve with absolute?

url_path = Path(f.url).absolute()

mexthecat avatar Feb 18 '22 09:02 mexthecat

It looks like, that workspace add does NOT create a copy of the (image) file in addition to the link anymore (using version 2.32).

But now I have the same problem using ocrd-cis-ocropy-binarize.

--> please, clarify

stefanCCS avatar Apr 14 '22 14:04 stefanCCS

Thanks @stefanCCS for the report and @mexthecat for the analysis!

IIUC resolve() must indeed be replaced by absolute, and that this causes not only the validator but also all processors to behave the same way.

@kba maybe we should also cover this in the tests by using symlinks somewhere in assets.

bertsky avatar May 05 '22 06:05 bertsky

Hi @kba is there a fix available for this topic (I want to start another project, which includes some data to copy (or better just to link) ...)?

stefanCCS avatar Jul 05 '22 14:07 stefanCCS

I've created a pull request

https://github.com/OCR-D/core/pull/954

Maybe I made it to easy - I use this patch for quit some time - couldn't see any strange behaviour.

mexthecat avatar Nov 16 '22 09:11 mexthecat

Fixed in the v2.42.0. Thanks @mexthecat!

kba avatar Nov 23 '22 16:11 kba

@kba: Just tested out - works fine - many thanks :-)

stefanCCS avatar Nov 25 '22 13:11 stefanCCS