cfgrib icon indicating copy to clipboard operation
cfgrib copied to clipboard

Read some variables by 'filter_by_keys'

Open wangrenz opened this issue 4 years ago • 11 comments

Hi,

I want to read GFS u,v variables, like:

ds = xr.open_mfdataset(path_list, concat_dim='valid_time', combine='nested', engine='cfgrib',
                                   backend_kwargs={ 'filter_by_keys':{ 'cfVarName': ['u','v'],  'typeOfLevel':'isobaricInhPa'},'indexpath':''})

But, it report an error

Traceback (most recent call last):
  File "read_uv.py", line 20, in _read
    backend_kwargs={ 'filter_by_keys':{ 'cfVarName': ['u','v'],  'typeOfLevel':'isobaricInhPa'},'indexpath':''})
  File "/home/miniconda3/envs/metenv/lib/python3.7/site-packages/xarray/backends/api.py", line 908, in open_mfdataset
    datasets = [open_(p, **open_kwargs) for p in paths]
  File "/home/miniconda3/envs/metenv/lib/python3.7/site-packages/xarray/backends/api.py", line 908, in <listcomp>
    datasets = [open_(p, **open_kwargs) for p in paths]
  File "/home/miniconda3/envs/metenv/lib/python3.7/site-packages/xarray/backends/api.py", line 520, in open_dataset
    filename_or_obj, lock=lock, **backend_kwargs
  File "/home/miniconda3/envs/metenv/lib/python3.7/site-packages/xarray/backends/cfgrib_.py", line 43, in __init__
    self.ds = cfgrib.open_file(filename, **backend_kwargs)
  File "/home/miniconda3/envs/metenv/lib/python3.7/site-packages/cfgrib/dataset.py", line 641, in open_file
    return Dataset(*build_dataset_components(index, read_keys=read_keys, **kwargs))
  File "/home/miniconda3/envs/metenv/lib/python3.7/site-packages/cfgrib/dataset.py", line 563, in build_dataset_components
    for param_id in index['paramId']:
  File "/home/miniconda3/envs/metenv/lib/python3.7/site-packages/cfgrib/messages.py", line 359, in __getitem__
    return self.header_values[item]
KeyError: 'paramId'

How to read only multiple variables?

wangrenz avatar Apr 26 '20 09:04 wangrenz

@wangrenz sorry for the late reply, I think you are hitting the problem with the MULTI-FILED message that I'm tracking with #111 (cfgrib has problem accessing the v-component of GFS).

I just committed a tentative fix in branch stable/0.9.8.x, if you want to try it out.

alexamici avatar May 21 '20 15:05 alexamici

I "think" the root cause of this particular issue, is not being able to access the v-component of a MULTI-FIELD message, so I "think" the issue is actually resolved by the release of version 0.9.8.2.

Feel free to reopen it if it still persist, but I would need some more detail in case.

alexamici avatar May 22 '20 12:05 alexamici

I "think" the root cause of this particular issue, is not being able to access the v-component of a MULTI-FIELD message, so I "think" the issue is actually resolved by the release of version 0.9.8.2.

Feel free to reopen it if it still persist, but I would need some more detail in case.

Thanks for your work.

But I still have same problem of only read u, v variables. The error message is the same as above.

Is not support key of cfVarName ?

Regards.

wangrenz avatar May 23 '20 09:05 wangrenz

Hi there. I'm having the same issue with these grib2 files

Trying to read solely the tp variable like this:

wrf = xa.open_dataset('WRF_cpt_05KM_2020071400_2020071400.grib2', engine='cfgrib', backend_kwargs={'filter_by_keys': {'typeOfLevel': 'surface', 'cfVarName': ['tp'] } } )

The error message is pretty much the same as the one @wangrenz gets:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-11-9898da9e54c5> in <module>()
----> 1 wrf = xa.open_dataset('WRF_cpt_05KM_2020071400_2020071400.grib2'', engine='cfgrib', backend_kwargs={'filter_by_keys': {'typeOfLevel': 'surface', 'cfVarName': ['tp'] } } )
/usr/lib/python3/dist-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime)
435         elif engine == 'cfgrib':
436             store = backends.CfGribDataStore(
--> 437                 filename_or_obj, lock=lock, **backend_kwargs)
438
439     else:
/usr/lib/python3/dist-packages/xarray/backends/cfgrib_.py in __init__(self, filename, lock, **backend_kwargs)
38             lock = ECCODES_LOCK
39         self.lock = ensure_lock(lock)
---> 40         self.ds = cfgrib.open_file(filename, **backend_kwargs)
41
42     def open_store_variable(self, name, var):
/home/fcannini/.local/lib/python3.7/site-packages/cfgrib/dataset.py in open_file(path, grib_errors, indexpath, filter_by_keys, read_keys, **kwargs)
649     index_keys = sorted(ALL_KEYS + read_keys)
650     index = open_fileindex(path, grib_errors, indexpath, index_keys).subindex(filter_by_keys)
--> 651     return Dataset(*build_dataset_components(index, read_keys=read_keys, **kwargs))
/home/fcannini/.local/lib/python3.7/site-packages/cfgrib/dataset.py in build_dataset_components(index, errors, encode_cf, squeeze, log, read_keys, time_dims)
571     variables = collections.OrderedDict()
572     filter_by_keys = index.filter_by_keys
--> 573     for param_id in index['paramId']:
574         var_index = index.subindex(paramId=param_id)
575         try:
/home/fcannini/.local/lib/python3.7/site-packages/cfgrib/messages.py in __getitem__(self, item)
385     def __getitem__(self, item):
386         # type: (str) -> list
--> 387         return self.header_values[item]
388
389     def getone(self, item):
KeyError: 'paramId'

I also tried to upgrade to 0.9.8.3 but no success.

fcannini avatar Jul 14 '20 18:07 fcannini

Can you please try the key shortName instead of cfVarName: {'typeOfLevel': 'surface', 'shortName': ['tp'] }

shahramn avatar Jul 14 '20 21:07 shahramn

@shahramn I've tried it too, same error.

fcannini avatar Jul 14 '20 23:07 fcannini

@wangrenz @shahramn I've managed to successfully open the files mentioned in my first comment by using solely backend_kwargs={'filter_by_keys' : {'shortName': 'tp'} }.

Strangely, in my case, using backend_kwargs={'filter_by_keys': {'typeOfLevel': 'surface'} } did not show the variable I'm interested, so I tried the above.

fcannini avatar Jul 23 '20 16:07 fcannini

@fcannini Yes, only one variable can be read.

e.g

ds_u = xr.open_mfdataset(path_list, concat_dim='valid_time', combine='nested', engine='cfgrib',
                                   backend_kwargs={ 'filter_by_keys':{ 'cfVarName': 'u',  'typeOfLevel':'isobaricInhPa'},'indexpath':''})
ds_v = xr.open_mfdataset(path_list, concat_dim='valid_time', combine='nested', engine='cfgrib',
                                   backend_kwargs={ 'filter_by_keys':{ 'cfVarName': 'v',  'typeOfLevel':'isobaricInhPa'},'indexpath':''})

This can be successfully read.

wangrenz avatar Jul 24 '20 13:07 wangrenz

Hi there,

thanks, this discussion helped me to get my code for reading a subset of variables working. However, it would be convenient and probably also more performant to be able to supply a list of values per filter key instead of opening a dataset for every single value and merging them afterwards ('cfVarName': ['u', 'v']). Are there any plans on supporting this in the future? If this is interesting to you, I could also try to provide an implementation for it.

phigre avatar Apr 17 '24 07:04 phigre