earthkit-data icon indicating copy to clipboard operation
earthkit-data copied to clipboard

Improve order_by speed for GRIB data

Open sandorkertesz opened this issue 1 year ago • 0 comments

The speed of order_by() is primarily a problem for "file" sources and it is down to the fact that for each field metadata access call in the sorting algorithm the GRIB message has to be loaded and encoded from the GRIB file over and over again.

E.g. this code runs in 10.43 s

import earthkit.data
ds = earthkit.data.from_source("file", "docs/examples/tuv_pl.grib")
for _ in range(200):
    ds.order_by("shortName")  

If we store all the messages in memory the running time goes down significantly to 4.65 s

import earthkit.data
ds = earthkit.data.from_source("file", "docs/examples/tuv_pl.grib")
x = ds.to_fieldlist("numpy")
for _ in range(200):
    x.order_by("shortName")  

sandorkertesz avatar Nov 16 '23 09:11 sandorkertesz