tropo_pyaps3: parallel downloads
Description of proposed changes
Reminders
- [ ] Fix #xxxx
- [ ] Pass Pre-commit check (green)
- [ ] Pass Codacy code review (green)
- [ ] Pass Circle CI test (green)
- [ ] Make sure that your code follows our style. Use the other functions/files as a basis.
- [ ] If modifying functionality, describe changes to function behavior and arguments in a comment below the function declaration.
- [ ] If adding new functionality, add a detailed description to the documentation and/or an example.
Thank you @ritwika21 for contributing!
Could you add some description to the PR?
And some questions on the ERA5 parallel downloading:
- How much time does it save now with this PR, compared with the current version, with an example?
- Based on what I was aware of a couple of years ago, ECMWF (via the Copernicus Climate Data Store) allows a max of 3 submitted jobs per user at the same time. Will it make more sense to set the parallel job number to 3, instead of 64, if the 3-job-limit still exists?
pre-commit.ci autofix
I heard that downloading one date as one file is not the right approach. ECMWF might have options to download all the required days for one SAR dataset as one file. I asked some atmospheric scientists for help, and they were shocked about how we download the data. But I did not pursue this yet.
But until the overall approach is fixed, any improvement is of course greatly appreciated!
Any link to the code or documentation for the downloading all-at-once approach will be very helpful.
PR Summary
This Pull Request introduces parallel downloading of GRIB files in the tropo_pyaps3.py module by utilizing Python's ThreadPoolExecutor. The change aims to improve the efficiency of downloading weather re-analysis data by processing multiple files concurrently. A new helper function, dload_grib_files_worker, is introduced to handle the download logic for individual files. Additionally, a minor modification in readfile.py changes the date parsing logic to split on underscores instead of colons, which likely aligns with a change in the input data format.
Review Checklist
- [ ] Fix #xxxx (No issue number provided)
- [ ] Pass Pre-commit check (green) (Ensure pre-commit checks are passing)
- [ ] Pass Codacy code review (green) (Ensure Codacy checks are passing)
- [ ] Pass Circle CI test (green) (Ensure Circle CI tests are passing)
- [ ] Make sure that your code follows our style. Use the other functions/files as a basis. (Verify code style consistency)
- [ ] If modifying functionality, describe changes to function behavior and arguments in a comment below the function declaration. (Ensure function behavior changes are documented)
- [ ] If adding new functionality, add a detailed description to the documentation and/or an example. (Ensure new functionality is documented)
Suggestion
Consider adding error handling within the dload_grib_files_worker function to manage potential exceptions during file downloads. This could prevent the entire download process from failing if a single file encounters an issue. Additionally, it might be beneficial to include logging to track the progress and status of each file download, which can aid in debugging and monitoring.