MintPy icon indicating copy to clipboard operation
MintPy copied to clipboard

tropo_pyaps3: parallel downloads

Open ritwika21 opened this issue 1 year ago • 5 comments

Description of proposed changes

Reminders

  • [ ] Fix #xxxx
  • [ ] Pass Pre-commit check (green)
  • [ ] Pass Codacy code review (green)
  • [ ] Pass Circle CI test (green)
  • [ ] Make sure that your code follows our style. Use the other functions/files as a basis.
  • [ ] If modifying functionality, describe changes to function behavior and arguments in a comment below the function declaration.
  • [ ] If adding new functionality, add a detailed description to the documentation and/or an example.

ritwika21 avatar May 12 '24 16:05 ritwika21

Thank you @ritwika21 for contributing!

Could you add some description to the PR?

And some questions on the ERA5 parallel downloading:

  1. How much time does it save now with this PR, compared with the current version, with an example?
  2. Based on what I was aware of a couple of years ago, ECMWF (via the Copernicus Climate Data Store) allows a max of 3 submitted jobs per user at the same time. Will it make more sense to set the parallel job number to 3, instead of 64, if the 3-job-limit still exists?

yunjunz avatar May 29 '24 03:05 yunjunz

pre-commit.ci autofix

yunjunz avatar May 29 '24 03:05 yunjunz

I heard that downloading one date as one file is not the right approach. ECMWF might have options to download all the required days for one SAR dataset as one file. I asked some atmospheric scientists for help, and they were shocked about how we download the data. But I did not pursue this yet.

But until the overall approach is fixed, any improvement is of course greatly appreciated!

falkamelung avatar May 29 '24 03:05 falkamelung

Any link to the code or documentation for the downloading all-at-once approach will be very helpful.

yunjunz avatar May 29 '24 04:05 yunjunz

PR Summary

This Pull Request introduces parallel downloading of GRIB files in the tropo_pyaps3.py module by utilizing Python's ThreadPoolExecutor. The change aims to improve the efficiency of downloading weather re-analysis data by processing multiple files concurrently. A new helper function, dload_grib_files_worker, is introduced to handle the download logic for individual files. Additionally, a minor modification in readfile.py changes the date parsing logic to split on underscores instead of colons, which likely aligns with a change in the input data format.

Review Checklist

  • [ ] Fix #xxxx (No issue number provided)
  • [ ] Pass Pre-commit check (green) (Ensure pre-commit checks are passing)
  • [ ] Pass Codacy code review (green) (Ensure Codacy checks are passing)
  • [ ] Pass Circle CI test (green) (Ensure Circle CI tests are passing)
  • [ ] Make sure that your code follows our style. Use the other functions/files as a basis. (Verify code style consistency)
  • [ ] If modifying functionality, describe changes to function behavior and arguments in a comment below the function declaration. (Ensure function behavior changes are documented)
  • [ ] If adding new functionality, add a detailed description to the documentation and/or an example. (Ensure new functionality is documented)

Suggestion

Consider adding error handling within the dload_grib_files_worker function to manage potential exceptions during file downloads. This could prevent the entire download process from failing if a single file encounters an issue. Additionally, it might be beneficial to include logging to track the progress and status of each file download, which can aid in debugging and monitoring.

This comment was generated by AI. Information provided may be incorrect.

Current plan usage: 0%

Have feedback or need help? Documentation [email protected]

codeautopilot[bot] avatar Feb 09 '25 17:02 codeautopilot[bot]