earthkit-data Internally process the request kwargs to optimize caching

Is your feature request related to a problem? Please describe.

The following snippet triggers 2 requests from the CDS, although the only difference in the requests is the order of the variables requested.

import earthkit.data
from earthkit.data import settings

settings.auto_save_settings = False
settings.set("cache-policy", "temporary")

request_kwargs = {
    "product_type": "reanalysis",
    "area": [50, -10, 40, 10],  # N,W,S,E
    "grid": [2, 2],
    "date": "2012-05-10",
    "time": "12:00",
}
for reverse in (True, False):
    earthkit.data.from_source(
        "cds",
        "reanalysis-era5-single-levels",
        variable=sorted(["2t", "msl"], reverse=reverse),
        **request_kwargs
    )

Describe the solution you'd like

Internally process the request kwargs to optimize caching. For example, sort lists (all but area/grid), squeeze single element lists (or do the opposite), change types when possible, ...

Describe alternatives you've considered

No response

Additional context

No response

Organisation

B-Open / CADS-EQC

Sep 29 '23 16:09 malmans2

Internally process the request kwargs to optimize caching. For example, sort lists (all but area/grid), squeeze single element lists (or do the opposite), change types when possible, ...

Should not all these be implemented in the cdsapi itself?

Oct 06 '23 08:10 sandorkertesz

I don't think cdsapi uses any local cache, does it?

I'd expect that earthkit uses the CDS request dictionary to construct the keys/hashes of its own cache items. When keys/hashes are available in the cache database (i.e., a cds file is already available in the cache), I'd expect that earthkit doesn't use cdsapi at all.

But maybe I misunderstood how earthkit cache works.

Oct 06 '23 09:10 malmans2

I am sorry but I thought that "sort lists (all but area/grid), squeeze single element lists (or do the opposite), change types when possible, ..." are not about caching.

Oct 06 '23 10:10 sandorkertesz