pyaerocom usage of pyaro too slow
The pyaro reader uses loops to convert pyaro data structures to ungriddeddata. This has worked fine for the data used by pyaro until now. But with EEA, where you have millions samples of samples per species per months, this method is slow. Reader needs to be numpyfied
Pm10 for one month is >5000 files, each file (should be) a single station... I also need to check if this is the case, or if there is overlap between files for some reason
From pyaerocom meeting: This is related to the caching. We should talk about this tomorrow at the Design Retreat. Previously we had explicitly not had caching in pyaro.
Not only. It is also slow because of the way I made it convert between pyaro and ungriddeddata. I foresaw this then I made the reader. So the time has come to do anything about it. Or is it the other issue you are talking about?
Notes from pyaerocom 28.10.24:
Two requests from pyaro:
- We want a cheap way to access the
data_revisionwithout reading the data. - We want a way to read temporal subsets of the data.
@magnusuMET fixed this in #1401 eliminating the use of loops and thereby speeding up the code. Close until becomes relevant again.