dezoomify-rs
dezoomify-rs copied to clipboard
Please reconsider handling of tile download errors
The log of downloading a dictionary volume (created with script
) has 21M. Now I know I should grep it for ERROR
.
Because of some network problems I had 3 cases of
Only ??? tiles out of ??? could be downloaded. The resulting image was still created.
I would notice the problem earlier if such images had the prefix e.g. incomplete
.
I would be also convenient to have the URL of the whole affected image (now only the URL of the tile is printed).
If you are doing batch download, you should probably handle the exit status of dezoomify-rs after is has run. And you should also probably tweak the network-related settings; in particular, you should increase the number of retries when a download fails and the time between consecutive retries.
This is my command:
time curl "https://polona.pl/iiif/item/MTI2MzI0NjU/manifest.json" | jq -r ".items[].id" | xargs -n 1 ./dezoomify-rs -l --parallelism 1 --timeout 60s --retry-delay 60s
Is there an easy way to add the exit status checking? Anyway I can live with it. As for the retries number I hope the network problems will not occur again. Moreover I'm not in a hurry and I don't want to increase the server load.
Increasing the number of retries will decrease the server load, not increase it. With only one retry, when the server starts to be overloaded and responds with errors, you will quickly move to the next tile and make one more request to the already overloaded server. With let's say 10 retries (and a parallelism of 1) dezoomify-rs will try 10 times with an exponental backoff strategy: it will make the second try after 10s, the next one after waiting another 20s, then 40s, and so on. This will be slower, but you will be sure not to overwhelm the server.
Thanks for the explanation. What about including it in the help? Now it says
-retry-delay
Yes, this should be included in the help. Are you interested in making a contribution? The argument documentation is in src/arguments.rs and the remaining documentation is in README.md.
Please have a look at my fork and check whether I understand correctly what is going on.
You can open a pull request here: https://github.com/lovasoa/dezoomify-rs/compare
I'll comment on it.
What's the exit status in case of partially saved images? Grepping for error is problematic. Something like --with-errors
or --without-errors
is needed for users who prefer file integrity over partial results.