Internally process the request kwargs to optimize caching
Is your feature request related to a problem? Please describe.
The following snippet triggers 2 requests from the CDS, although the only difference in the requests is the order of the variables requested.
import earthkit.data
from earthkit.data import settings
settings.auto_save_settings = False
settings.set("cache-policy", "temporary")
request_kwargs = {
"product_type": "reanalysis",
"area": [50, -10, 40, 10], # N,W,S,E
"grid": [2, 2],
"date": "2012-05-10",
"time": "12:00",
}
for reverse in (True, False):
earthkit.data.from_source(
"cds",
"reanalysis-era5-single-levels",
variable=sorted(["2t", "msl"], reverse=reverse),
**request_kwargs
)
Describe the solution you'd like
Internally process the request kwargs to optimize caching. For example, sort lists (all but area/grid), squeeze single element lists (or do the opposite), change types when possible, ...
Describe alternatives you've considered
No response
Additional context
No response
Organisation
B-Open / CADS-EQC
Internally process the request kwargs to optimize caching. For example, sort lists (all but area/grid), squeeze single element lists (or do the opposite), change types when possible, ...
Should not all these be implemented in the cdsapi itself?
I don't think cdsapi uses any local cache, does it?
I'd expect that earthkit uses the CDS request dictionary to construct the keys/hashes of its own cache items. When keys/hashes are available in the cache database (i.e., a cds file is already available in the cache), I'd expect that earthkit doesn't use cdsapi at all.
But maybe I misunderstood how earthkit cache works.
I am sorry but I thought that "sort lists (all but area/grid), squeeze single element lists (or do the opposite), change types when possible, ..." are not about caching.