openeo-python-client icon indicating copy to clipboard operation
openeo-python-client copied to clipboard

robust/ranged download support

Open soxofaan opened this issue 10 months ago • 4 comments

EO data downloads can be pretty big, and big transfers can be brittle in some situations. If the server supports ranged downloads, it's possible to do this more robustly.

soxofaan avatar Mar 10 '25 09:03 soxofaan

inspiration: https://gist.github.com/pvbouwel/f7d23dbe8e7207f1b7ede8e7f924d868 by @pvbouwel

soxofaan avatar Mar 10 '25 09:03 soxofaan

Some notes on getting more production-like code:

  • Make sure to limit to GET operations (see rev2 with 0-0 byte range) as that would also work for things like pre-signed S3 URLs
  • Make sure retries are done for certain HTTP status codes
  • Rather than writing immediately to the target file keep chunks separate until all chunks are download and then reconstruct the file. (to reap more benefits of retries)

pvbouwel avatar Mar 10 '25 09:03 pvbouwel

we solved that in the WEED project for cases were we have to download the data locally in this way: openEO pipeline always dump the files into a S3 bucket and then in the jobmanager we have a threaded download to get the files from S3 and after successful download (etag check) to delete the file on S3. This is even way faster then directly downloading from openEO. example: 4GB produced file with direct download needs roughly 20 - 25 min for the download to local disk after the file is successful processed. With our setup the download is roughly 3-4 minutes. https://github.com/ESA-WEED-project/eo_processing/blob/main/src/eo_processing/utils/jobmanager.py#L260

if you have questions to the storage object we are using then just ask :)

mbuchhorn avatar Apr 02 '25 08:04 mbuchhorn

Pointer to where this probably needs to be added: https://github.com/Open-EO/openeo-python-client/blob/9b92d6f59566ec2d32deaaddae261e683bdd9861/openeo/rest/job.py#L405

Example problematic file: https://openeo.dataspace.copernicus.eu/openeo/1.2/jobs/j-2504020555234b3287f2fe696bea5ded/results/assets/MzJjYzdkZGItZjdlMS00YjFjLTk3OTYtZjlmZTM5Y2I4ZmVi/efc6009b1fdb1d283b91e65538400aac/feature-datacubes_v1_feature-cube_year2024_E412N322.tif?expires=1744207076

jdries avatar Apr 02 '25 13:04 jdries