robust/ranged download support
EO data downloads can be pretty big, and big transfers can be brittle in some situations. If the server supports ranged downloads, it's possible to do this more robustly.
inspiration: https://gist.github.com/pvbouwel/f7d23dbe8e7207f1b7ede8e7f924d868 by @pvbouwel
Some notes on getting more production-like code:
- Make sure to limit to GET operations (see rev2 with 0-0 byte range) as that would also work for things like pre-signed S3 URLs
- Make sure retries are done for certain HTTP status codes
- Rather than writing immediately to the target file keep chunks separate until all chunks are download and then reconstruct the file. (to reap more benefits of retries)
we solved that in the WEED project for cases were we have to download the data locally in this way: openEO pipeline always dump the files into a S3 bucket and then in the jobmanager we have a threaded download to get the files from S3 and after successful download (etag check) to delete the file on S3. This is even way faster then directly downloading from openEO. example: 4GB produced file with direct download needs roughly 20 - 25 min for the download to local disk after the file is successful processed. With our setup the download is roughly 3-4 minutes. https://github.com/ESA-WEED-project/eo_processing/blob/main/src/eo_processing/utils/jobmanager.py#L260
if you have questions to the storage object we are using then just ask :)
Pointer to where this probably needs to be added: https://github.com/Open-EO/openeo-python-client/blob/9b92d6f59566ec2d32deaaddae261e683bdd9861/openeo/rest/job.py#L405
Example problematic file: https://openeo.dataspace.copernicus.eu/openeo/1.2/jobs/j-2504020555234b3287f2fe696bea5ded/results/assets/MzJjYzdkZGItZjdlMS00YjFjLTk3OTYtZjlmZTM5Y2I4ZmVi/efc6009b1fdb1d283b91e65538400aac/feature-datacubes_v1_feature-cube_year2024_E412N322.tif?expires=1744207076