cf-xarray icon indicating copy to clipboard operation
cf-xarray copied to clipboard

DataArray coordinates are dropped

Open malmans2 opened this issue 3 years ago • 5 comments

It looks like when I pull out a DataArray from a Dataset using cf_xarray the coordinates are dropped (while xarray keeps them). Here is an example:

import xarray as xr
import cf_xarray
import numpy as np
x = 10
y = 20
ds = xr.Dataset(
    dict(
        lon=xr.DataArray(
            np.arange(x), dims=("x"), attrs=dict(standard_name="longitude"),
        ),
        lat=xr.DataArray(
            np.arange(y), dims=("y"), attrs=dict(standard_name="latitude"),
        ),
        var=xr.DataArray(
            np.random.rand(x, y),
            dims=("x", "y"),
            attrs=dict(standard_name="cf_var_name"),
        ),
    )
)
ds = ds.set_coords(["lon", "lat"])
ds.cf.describe()
Axes:
	X: []
	Y: []
	Z: []
	T: []

Coordinates:
	longitude: ['lon']
	latitude: ['lat']
	vertical: []
	time: []

Cell Measures:
	area: unsupported
	volume: unsupported

Standard Names:
	cf_var_name: ['var']
print(ds["var"].coords)
Coordinates:
    lon      (x) int64 0 1 2 3 4 5 6 7 8 9
    lat      (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
print(ds.cf["cf_var_name"].coords)
Coordinates:
    *empty*
# OK if I pull out the DataArray directly from ds
ds["var"].cf.plot(x="longitude", y="latitude")
<matplotlib.collections.QuadMesh at 0x7fbf13acf690>

image

# Problem if I pull out the DataArray from ds.cf
ds.cf["cf_var_name"].cf.plot(x="longitude", y="latitude")
---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

/noc/msm/scratch/climate/malmans/miniconda3/envs/overflows/lib/python3.7/site-packages/xarray/core/dataarray.py in _getitem_coord(self, key)
    628         try:
--> 629             var = self._coords[key]
    630         except KeyError:


KeyError: 'longitude'


During handling of the above exception, another exception occurred:


KeyError                                  Traceback (most recent call last)

<ipython-input-7-072235ade112> in <module>
      1 # Problem if I pull out the DataArray from ds.cf
----> 2 ds.cf["cf_var_name"].cf.plot(x="longitude", y="latitude")


/noc/msm/scratch/climate/malmans/miniconda3/envs/overflows/lib/python3.7/site-packages/cf_xarray/accessor.py in __call__(self, *args, **kwargs)
    629             key_mappers=dict.fromkeys(self._keys, (_get_axis_coord_single,)),
    630         )
--> 631         return self._plot_decorator(plot)(*args, **kwargs)
    632 
    633     def __getattr__(self, attr):


/noc/msm/scratch/climate/malmans/miniconda3/envs/overflows/lib/python3.7/site-packages/cf_xarray/accessor.py in _plot_wrapper(*args, **kwargs)
    597                     xvar = self.accessor[kwargs["x"]]
    598                 else:
--> 599                     xvar = self._obj[kwargs["x"]]
    600                 if "positive" in xvar.attrs:
    601                     if xvar.attrs["positive"] == "down":


/noc/msm/scratch/climate/malmans/miniconda3/envs/overflows/lib/python3.7/site-packages/xarray/core/dataarray.py in __getitem__(self, key)
    638     def __getitem__(self, key: Any) -> "DataArray":
    639         if isinstance(key, str):
--> 640             return self._getitem_coord(key)
    641         else:
    642             # xarray-style array indexing


/noc/msm/scratch/climate/malmans/miniconda3/envs/overflows/lib/python3.7/site-packages/xarray/core/dataarray.py in _getitem_coord(self, key)
    631             dim_sizes = dict(zip(self.dims, self.shape))
    632             _, key, var = _get_virtual_variable(
--> 633                 self._coords, key, self._level_coords, dim_sizes
    634             )
    635 


/noc/msm/scratch/climate/malmans/miniconda3/envs/overflows/lib/python3.7/site-packages/xarray/core/dataset.py in _get_virtual_variable(variables, key, level_vars, dim_sizes)
    169         ref_var = dim_var.to_index_variable().get_level_variable(ref_name)
    170     else:
--> 171         ref_var = variables[ref_name]
    172 
    173     if var_name is None:


KeyError: 'longitude'

PS: This package is great! Let me know if I can help!

malmans2 avatar Oct 05 '20 14:10 malmans2

This is currently intentional. To get what you want add ds.var.attrs["coordinates"] = "lat lon"

Right now if there are no coordinates or ancillary_variables attributes, cf_xarray will not attach any non-dimensional coordinate variables. (in the CF sense there are no explicit links between var and lat or lon).

We could change the behaviour for this particular case (no coordinates or ancillary_variables attributes) to just include all non-dimensional coordinate variables, but I think that may be more confusing.

We should add an FAQ for these kinds of things. That would be a nice thing to contribute!

EDIT: xarray's heuristics are to keep non-dim coordinate variables when set(non_dim_coord.dims) <= set(dataarray.dims)

dcherian avatar Oct 05 '20 15:10 dcherian

Ohh my bad, I missed that part in the documentation. Yes I think it makes sense to only use the variables in the coordinates attribute. Thanks!

I'll start the FAQ section!

malmans2 avatar Oct 06 '20 08:10 malmans2

I'm wondering whether it would be a good idea to retain cell measures that are not defined by CF conventions. I.e., apply the same behavior of coordinates to cell_measures, where all coordinates are retained, but only latitude, longitude, vertical, and time are understood by cf_xarray.

For example, I'd like to associate a DataArray with cell area, thickness, x/y widths. Besides for area and volume, when I select a DataArray from a Dataset the variables defined in the cell_measures attribute are not retained. To do that, they need to be added to the coordinates attribute.

This would allow to use coordinates for lat lon depth time, and cell measures for volume, area, thickness, and widths, ...

malmans2 avatar Nov 16 '20 13:11 malmans2

~~Actually, this would apply when I create a "sub-Dataset", like sub_ds = ds.cf[[standard_name]].~~ Edit: Nevermind, it would apply to both DataArrays and sub-Datasets pulled out from a Dataset. In the latter the additional measures are also shown in describe as Standard Names.

malmans2 avatar Nov 16 '20 13:11 malmans2

Yes sounds good to me. Variable names in cell_measures should become non-dim coordinate variables if they can. Can you modify get_associated_variable_names appropriately?

IIUC you are proposing that only area and volume be "special" measure names (like present). :+1:

One issue is that .cf.cell_measures is now confusing, since it will only show area and volume. Maybe this could be a documentation fix?

dcherian avatar Nov 16 '20 14:11 dcherian