core icon indicating copy to clipboard operation
core copied to clipboard

workspace download: also traverse dependent file groups?

Open bertsky opened this issue 5 years ago • 3 comments
trafficstars

When I want to download a PAGE-XML from remote, it would be very helpful if core would also download all the files referenced in /PcGts/Page/@imageFilename and */AlternativeImage/@filename. Is this feasible?

bertsky avatar Jan 17 '20 11:01 bertsky

It's doable. Related to #378 #323 and #176

kba avatar Jan 17 '20 11:01 kba

To clarify: For a PAGE URL https://remote/page.xml, you want to download the PAGE-XML and then resolve the Page/@imageFilename / AlternativeImage/@filename references by prepending http://remote to the file paths?

kba avatar Jun 07 '20 18:06 kba

To clarify: For a PAGE URL https://remote/page.xml, you want to download the PAGE-XML and then resolve the Page/@imageFilename / AlternativeImage/@filename references by prepending http://remote to the file paths?

No, not quite (I think). After downloading a PAGE-XML, its (original and derived) image references could be relative paths (and then instead of replacing them with a URL by prepending http://remote it would be better to ensure these relative paths do exist locally by downloading them and adapting their mets:file entry accordingly) or URLs already (in which case they should be replaced by a relative path and downloaded etc).

bertsky avatar Jun 08 '20 10:06 bertsky

To clarify: For a PAGE URL https://remote/page.xml, you want to download the PAGE-XML and then resolve the Page/@imageFilename / AlternativeImage/@filename references by prepending http://remote to the file paths?

No, not quite (I think). After downloading a PAGE-XML, its (original and derived) image references could be relative paths (and then instead of replacing them with a URL by prepending http://remote it would be better to ensure these relative paths do exist locally by downloading them and adapting their mets:file entry accordingly) or URLs already (in which case they should be replaced by a relative path and downloaded etc).

The difficult part is how to download those references, if they are relative file URL (i.e. were produced by OCR-D before). We do have now support for both local and remote URL #1079 but that is not widely used yet and even if it was, it's unlikely that OCR-D users would expose the intermediary results via URL.

The only way around this restriction is if the remote workspace is available as OCRD-ZIP, in which case we assume that all the referenced image should be in the workspace.

AFAIK nobody except us is using @imageFilename etc. with URL, so supporting that is probably not sensible either.

So unless I'm mistaken, there is no good way to solve this.

kba avatar Nov 20 '23 12:11 kba