cosima-cookbook
cosima-cookbook copied to clipboard
Optimisation/best-practice xarray and dask programming patterns
Many people report problems with running calculations on large datasets, and would like some general advice on the best approaches for tackling large problems.
There are lots of parameters that determine the success/efficiency of a calculation:
- Order of operations
- Calculating intermediate results
- Dask chunking
- netCDF chunking on disk
- Number of dask workers (or not using a scheduler/dask at all)
- Number of threads and amount of memory per worker
It becomes very complex very quickly.
One approach is to have some representative test calculations that can then be used as a target for optimisation. These test calculations can be run whenever there are infrastructure or algorithm changes to check there has been no degradation in performance, or if they might be further improved.
If that sounds like a useful idea then we need people to propose calculations that they know to be strenuous as possibilities for optimisation/best-practicification*. Ideally these would be fairly compact, reproducible chunks of code.
ping @AndyHoggANU @aekiss @adele-morrison @navidcy @angus-g
- not a real word
OK, here's one!
https://gist.github.com/navidcy/b12e5469d1a809cc4c9b447456da1fe5
(better viewed in nbviewer)
cc: @ongqingyee and @angus-g. @angus-g this is the one I was chatting with you yesterday
OK, here's one!
https://gist.github.com/navidcy/b12e5469d1a809cc4c9b447456da1fe5
(better viewed in nbviewer)
cc: @ongqingyee and @angus-g. @angus-g this is the one I was chatting with you yesterday
I'm guessing that I should save the interpolated fields and reload them... But this might be just my random (or semi-educated) guess...
Actually, now I noticed that this MnWE might not be as relevant here since it does not use the cookbook
... Oh well....
The cookbook only really wraps the act of getting the data in the first place, so it's the actual (attempted) computation that's more important IMO. Thanks for the example! I'll take a look
This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:
https://forum.access-hive.org.au/t/cosima-cookbook-updating-needs/130/2