cf-xarray
cf-xarray copied to clipboard
DataArray coordinates are dropped
It looks like when I pull out a DataArray from a Dataset using cf_xarray
the coordinates are dropped (while xarray
keeps them).
Here is an example:
import xarray as xr
import cf_xarray
import numpy as np
x = 10
y = 20
ds = xr.Dataset(
dict(
lon=xr.DataArray(
np.arange(x), dims=("x"), attrs=dict(standard_name="longitude"),
),
lat=xr.DataArray(
np.arange(y), dims=("y"), attrs=dict(standard_name="latitude"),
),
var=xr.DataArray(
np.random.rand(x, y),
dims=("x", "y"),
attrs=dict(standard_name="cf_var_name"),
),
)
)
ds = ds.set_coords(["lon", "lat"])
ds.cf.describe()
Axes:
X: []
Y: []
Z: []
T: []
Coordinates:
longitude: ['lon']
latitude: ['lat']
vertical: []
time: []
Cell Measures:
area: unsupported
volume: unsupported
Standard Names:
cf_var_name: ['var']
print(ds["var"].coords)
Coordinates:
lon (x) int64 0 1 2 3 4 5 6 7 8 9
lat (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
print(ds.cf["cf_var_name"].coords)
Coordinates:
*empty*
# OK if I pull out the DataArray directly from ds
ds["var"].cf.plot(x="longitude", y="latitude")
<matplotlib.collections.QuadMesh at 0x7fbf13acf690>
# Problem if I pull out the DataArray from ds.cf
ds.cf["cf_var_name"].cf.plot(x="longitude", y="latitude")
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/noc/msm/scratch/climate/malmans/miniconda3/envs/overflows/lib/python3.7/site-packages/xarray/core/dataarray.py in _getitem_coord(self, key)
628 try:
--> 629 var = self._coords[key]
630 except KeyError:
KeyError: 'longitude'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-7-072235ade112> in <module>
1 # Problem if I pull out the DataArray from ds.cf
----> 2 ds.cf["cf_var_name"].cf.plot(x="longitude", y="latitude")
/noc/msm/scratch/climate/malmans/miniconda3/envs/overflows/lib/python3.7/site-packages/cf_xarray/accessor.py in __call__(self, *args, **kwargs)
629 key_mappers=dict.fromkeys(self._keys, (_get_axis_coord_single,)),
630 )
--> 631 return self._plot_decorator(plot)(*args, **kwargs)
632
633 def __getattr__(self, attr):
/noc/msm/scratch/climate/malmans/miniconda3/envs/overflows/lib/python3.7/site-packages/cf_xarray/accessor.py in _plot_wrapper(*args, **kwargs)
597 xvar = self.accessor[kwargs["x"]]
598 else:
--> 599 xvar = self._obj[kwargs["x"]]
600 if "positive" in xvar.attrs:
601 if xvar.attrs["positive"] == "down":
/noc/msm/scratch/climate/malmans/miniconda3/envs/overflows/lib/python3.7/site-packages/xarray/core/dataarray.py in __getitem__(self, key)
638 def __getitem__(self, key: Any) -> "DataArray":
639 if isinstance(key, str):
--> 640 return self._getitem_coord(key)
641 else:
642 # xarray-style array indexing
/noc/msm/scratch/climate/malmans/miniconda3/envs/overflows/lib/python3.7/site-packages/xarray/core/dataarray.py in _getitem_coord(self, key)
631 dim_sizes = dict(zip(self.dims, self.shape))
632 _, key, var = _get_virtual_variable(
--> 633 self._coords, key, self._level_coords, dim_sizes
634 )
635
/noc/msm/scratch/climate/malmans/miniconda3/envs/overflows/lib/python3.7/site-packages/xarray/core/dataset.py in _get_virtual_variable(variables, key, level_vars, dim_sizes)
169 ref_var = dim_var.to_index_variable().get_level_variable(ref_name)
170 else:
--> 171 ref_var = variables[ref_name]
172
173 if var_name is None:
KeyError: 'longitude'
PS: This package is great! Let me know if I can help!
This is currently intentional. To get what you want add ds.var.attrs["coordinates"] = "lat lon"
Right now if there are no coordinates
or ancillary_variables
attributes, cf_xarray
will not attach any non-dimensional coordinate variables. (in the CF sense there are no explicit links between var
and lat
or lon
).
We could change the behaviour for this particular case (no coordinates
or ancillary_variables
attributes) to just include all non-dimensional coordinate variables, but I think that may be more confusing.
We should add an FAQ for these kinds of things. That would be a nice thing to contribute!
EDIT: xarray's heuristics are to keep non-dim coordinate variables when set(non_dim_coord.dims) <= set(dataarray.dims)
Ohh my bad, I missed that part in the documentation. Yes I think it makes sense to only use the variables in the coordinates attribute. Thanks!
I'll start the FAQ section!
I'm wondering whether it would be a good idea to retain cell measures that are not defined by CF conventions. I.e., apply the same behavior of coordinates
to cell_measures
, where all coordinates are retained, but only latitude
, longitude
, vertical
, and time
are understood by cf_xarray.
For example, I'd like to associate a DataArray with cell area, thickness, x/y widths. Besides for area and volume, when I select a DataArray
from a Dataset
the variables defined in the cell_measures
attribute are not retained. To do that, they need to be added to the coordinates
attribute.
This would allow to use coordinates for lat lon depth time, and cell measures for volume, area, thickness, and widths, ...
~~Actually, this would apply when I create a "sub-Dataset", like sub_ds = ds.cf[[standard_name]]
.~~
Edit: Nevermind, it would apply to both DataArrays and sub-Datasets pulled out from a Dataset. In the latter the additional measures are also shown in describe as Standard Names.
Yes sounds good to me. Variable names in cell_measures
should become non-dim coordinate variables if they can. Can you modify get_associated_variable_names
appropriately?
IIUC you are proposing that only area
and volume
be "special" measure names (like present). :+1:
One issue is that .cf.cell_measures
is now confusing, since it will only show area
and volume
. Maybe this could be a documentation fix?