wxee
wxee copied to clipboard
Improve download stability
The current download system is pretty solid with automated retrying, but the cdsapi package has a more extensive system that should improve download stability. See their implementation for reference.
There are a few recent additions to the EE API that may make this easier.
-
ee.Image.getDownloadURL
now accepts a format parameter so we no longer have to deal with zipped GeoTiffs. -
ee.data.computePixels
allows image downloads without the intermediate URL generation step. I'm not sure what the performance implications are, but it should at least simplify if not speed up downloads. This method seems to accept the same parameters and be subject to the same size restrictions asgetDownloadURL
. I believe this was previously REST API only, but is now available through the Python client API.
Just ran some benchmarks on download speed and fsspec
seems to have a big advantage over the current requests
system. It can also handle concurrent downloads out of the box. That may be useful, but unfortunately I don't think it will be enough to let us drop joblib
as a dependency since we'll still need that for grabbing URLs.
I don't love the idea of adding a new dependency, but if it can reduce download times substantially and simplify the download system, I think it's worth adding fsspec
.
With ee.data.computePixels now available in the Python API (as of 2023-02-15), that will probably be the most straightforward way to grab image data.
It has the same size limitation as other methods, but allows data to be retrieved directly rather than through an intermediate URL, which should be a win for performance, simplicity, and reliability. Also, this would allow us to avoid adding fsspec
and probably remove requests
as dependencies.
I need to do some benchmarking to make sure there are no downsides, but at the moment this looks like the way to go. Note that as with all direct GEO_TIFF
format downloads, it does not currently export band names, which means we unfortunately have to grab them manually with getInfo
.
A quick-and-dirty benchmark test says computePixels
is noticeably faster than downloading with fsspec
and the current requests
implementation, even for a single image where you have to grab bandNames
. With more images, that improvement should scale since bandNames
will only need to be retrieved once.
Time to download a single-band GridMET image at native resolution:
Method | Time |
---|---|
getDownloadURL + requests | 3.2s |
getDownloadURL + fsspec | 2.2s |
computePixels | 1.9s |