pyaerocom icon indicating copy to clipboard operation
pyaerocom copied to clipboard

pyaerocom usage of pyaro too slow

Open dulte opened this issue 1 year ago • 3 comments

The pyaro reader uses loops to convert pyaro data structures to ungriddeddata. This has worked fine for the data used by pyaro until now. But with EEA, where you have millions samples of samples per species per months, this method is slow. Reader needs to be numpyfied

dulte avatar Aug 09 '24 12:08 dulte

Pm10 for one month is >5000 files, each file (should be) a single station... I also need to check if this is the case, or if there is overlap between files for some reason

dulte avatar Aug 09 '24 12:08 dulte

From pyaerocom meeting: This is related to the caching. We should talk about this tomorrow at the Design Retreat. Previously we had explicitly not had caching in pyaro.

lewisblake avatar Aug 12 '24 09:08 lewisblake

Not only. It is also slow because of the way I made it convert between pyaro and ungriddeddata. I foresaw this then I made the reader. So the time has come to do anything about it. Or is it the other issue you are talking about?

dulte avatar Aug 12 '24 11:08 dulte

Notes from pyaerocom 28.10.24:

Two requests from pyaro:

  1. We want a cheap way to access the data_revision without reading the data.
  2. We want a way to read temporal subsets of the data.

lewisblake avatar Oct 28 '24 10:10 lewisblake

@magnusuMET fixed this in #1401 eliminating the use of loops and thereby speeding up the code. Close until becomes relevant again.

lewisblake avatar Nov 18 '24 10:11 lewisblake