Enhancement to ERA5 Data Retrieval and Download Process
This update introduces an optimized approach for data retrieval and caching for ERA5 data from the Climate Data Store (CDS). Key changes include:
-
Caching Mechanism: Added a caching mechanism to prevent repeated downloads for identical data requests. The cache files are named based on a unique hash of the request parameters, making subsequent retrievals faster by using pre-downloaded data.
-
Custom Download Function: Integrated a custom download function with a progress bar to enhance user experience. The function uses chunked downloading with error handling and retry mechanisms for a robust download process.
-
Progress Bar: A dynamic progress bar displays the download status of multiple files, with completed files removed from the display to improve readability.
These improvements aim to make data retrieval more efficient and user-friendly.
Closes # (if applicable).
Changes proposed in this Pull Request
Checklist
- [x] Code changes are sufficiently documented; i.e. new functions contain docstrings and further explanations may be given in
doc. - [ ] Newly introduced dependencies are added to
environment.yaml,environment_docs.yamlandsetup.py(if applicable). - [ ] A note for the release notes
doc/release_notes.rstof the upcoming release is included. - [ ] Unit tests for new features were added (if applicable).
- [ ] I consent to the release of this PR's code under the MIT license.
Hi @lkstrp, We asked @yndevops2 to help us speed up the download because with the main branch version of Atlite it was not possible to download global grid-scale multiyear time series for the capacity factors, which we needed for a project. With this upgrade that is now possible.
The caching is an optional flag anyway, but we can talk about if you'd be interested in only integrating the sped-up download. The idea for the caching was that when you change the region of interest to something smaller than what has been downloaded before, one could avoid redownloading the data.