core icon indicating copy to clipboard operation
core copied to clipboard

bashlib: provide a Workspace.download_file analogue

Open bertsky opened this issue 3 years ago • 1 comments

All Pythonic processors use Workspace.download_file for their input files for resolving/retrieval. This allows remote URLs to be downloaded ad-hoc to the workspace (without changing the METS reference), but also avoids re-downloading them again (by follow-up processors).

Unfortunately, bashlib processors have no such option, yet. (Ideally, the solution would re-use the files already downloaded by the Pythonic download_file.)

bertsky avatar Oct 26 '22 12:10 bertsky

How about a ocrd workspace get which takes file ID(s) and returns their local path name? This would entail downloading (in the same manner as Workspace.download_file, i.e. Resolver.download_to_directory), but without any changes to the METS. We would then encourage all bashlib-based processors to use that command just prior to accessing any file on disk.

bertsky avatar Oct 26 '22 14:10 bertsky