core icon indicating copy to clipboard operation
core copied to clipboard

RFC: Make workspace cloning more robust

Open stweil opened this issue 5 years ago • 6 comments

Currently cloning of a workspace with ocrd workspace clone --download aborts if some files cannot be downloaded.

It would help if instead of aborting the download all other files would be finished.

Example: http://gei-digital.gei.de/viewer/metsresolver?id=PPN1024726142. Obviously the TIFF images are only available locally but not for download over the Internet.

stweil avatar Jan 24 '20 20:01 stweil

It would help if instead of aborting the download all other files would be finished.

Yes and it would be in line with the recent change for mets.xml (skip instead of raise).

kba avatar Jan 28 '20 14:01 kba

http://gei-digital.gei.de/viewer/metsresolver?id=PPN1024726142

Do you have another example? I cannot reach that one.

kba avatar Jan 28 '20 14:01 kba

Nor can I. That looks like a temporary failure of the GEI website. So either wait, or look for other Intranda libraries - they might all be similar. I could also provide a local copy, but that will not help much as long as that website is down.

stweil avatar Jan 28 '20 14:01 stweil

Here is an extract with one of the entries which cause a fatal exception:

<mets:file ID="FILE_0028_PRESENTATION" MIMETYPE="image/tiff">
  <mets:FLocat LOCTYPE="URL" xlink:href="file:///opt/digiverso/viewer/tiff/PPN1024726142/00000028.tif"/>
</mets:file>

It usually does not make sense to try a download for a file: URL, so such URLs could also simply be copied as is even when download was requested.

stweil avatar Jan 28 '20 15:01 stweil

GEI is online again. I tried several of their METS files, and they all include references to local files which of course cannot be cloned.

stweil avatar Feb 21 '20 14:02 stweil

OK, I can reproduce the problem, on my TODO list.

kba avatar Feb 21 '20 14:02 kba