earthkit-data
earthkit-data copied to clipboard
Improve order_by speed for GRIB data
The speed of order_by()
is primarily a problem for "file" sources and it is down to the fact that for each field metadata access call in the sorting algorithm the GRIB message has to be loaded and encoded from the GRIB file over and over again.
E.g. this code runs in 10.43 s
import earthkit.data
ds = earthkit.data.from_source("file", "docs/examples/tuv_pl.grib")
for _ in range(200):
ds.order_by("shortName")
If we store all the messages in memory the running time goes down significantly to 4.65 s
import earthkit.data
ds = earthkit.data.from_source("file", "docs/examples/tuv_pl.grib")
x = ds.to_fieldlist("numpy")
for _ in range(200):
x.order_by("shortName")