xarray
xarray copied to clipboard
unstack confusing re `Variable` / `IndexVariable`
What happened?
using unstack
on a DataArray generated using the .dt.daysinmonth
accessor with time
as a multiIndex fails with a ValueError. The mysterious part is that when I build an "identical" DataArray starting from the .data
of that same array, it works as expected (see output of example code).
I asked a colleague for help with this, and she said the attached code worked for older versions of xarray, but said it seems to be broken starting at 2023.5.0.
What did you expect to happen?
Expected to get a DataArray (days0
) with dimensions ('year', 'month') with sizes (2, 12), which is what I get with the alternate DataArray (called days
).
Minimal Complete Verifiable Example
import sys
print(f"python {sys.version}")
import xarray as xr
import numpy as np
import cftime
print(f"numpy: {np.__version__}, xarray: {xr.__version__}, cftime: {cftime.__version__}")
t = np.array([cftime.DatetimeGregorian(1979, 1, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 2, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 3, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 4, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 5, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 6, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 7, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 8, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 9, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 10, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 11, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1979, 12, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 1, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 2, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 3, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 4, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 5, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 6, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 7, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 8, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 9, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 10, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 11, 1, 0, 0, 0, 0, has_year_zero=False),
cftime.DatetimeGregorian(1980, 12, 1, 0, 0, 0, 0, has_year_zero=False)])
dss = xr.DataArray(t, dims=['time'], coords={"time":t})
# TWO VERSIONS OF "days":
days0 = dss['time'].dt.daysinmonth
days = xr.DataArray(dss['time'].dt.daysinmonth.data, dims=['time'], coords={'time':dss['time']}, attrs=days0.attrs, name='days_in_month')
print(f"IDENTICAL: {days.identical(days0)}")
year = dss['time'].dt.year.data
month = dss['time'].dt.month.data
# REPEAT SAME STEPS FOR days and days0:
days = days.assign_coords(year=("time", year), month=("time", month))
days = days.set_index(time=['year', 'month'])
days0 = days0.assign_coords(year=("time", year), month=("time", month))
days0 = days0.set_index(time=['year', 'month'])
print(f"IDENTICAL: {days.identical(days0)}")
days = days.unstack('time') # THIS WORKS
print(f"{days.dims = }")
#
days0 = days0.unstack('time') # THIS FAILS
print(f"{days0.dims = }")
MVCE confirmation
- [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
python 3.12.0 | packaged by conda-forge | (main, Oct 3 2023, 08:36:57) [Clang 15.0.7 ]
numpy: 1.26.4, xarray: 2024.5.0, cftime: 1.6.3
IDENTICAL: True
IDENTICAL: True
days.dims = ('year', 'month')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[3], line 55
53 print(f"{days.dims = }")
54 #
---> 55 days0 = days0.unstack('time') # THIS FAILS
56 print(f"{days0.dims = }")
File ~/opt/miniconda3/envs/p12/lib/python3.12/site-packages/xarray/util/deprecation_helpers.py:115, in _deprecate_positional_args.<locals>._decorator.<locals>.inner(*args, **kwargs)
111 kwargs.update({name: arg for name, arg in zip_args})
113 return func(*args[:-n_extra_args], **kwargs)
--> 115 return func(*args, **kwargs)
File ~/opt/miniconda3/envs/p12/lib/python3.12/site-packages/xarray/core/dataarray.py:2950, in DataArray.unstack(self, dim, fill_value, sparse)
2888 @_deprecate_positional_args("v2023.10.0")
2889 def unstack(
2890 self,
(...)
2894 sparse: bool = False,
2895 ) -> Self:
2896 """
2897 Unstack existing dimensions corresponding to MultiIndexes into
2898 multiple new dimensions.
(...)
2948 DataArray.stack
2949 """
-> 2950 ds = self._to_temp_dataset().unstack(dim, fill_value=fill_value, sparse=sparse)
2951 return self._from_temp_dataset(ds)
File ~/opt/miniconda3/envs/p12/lib/python3.12/site-packages/xarray/util/deprecation_helpers.py:115, in _deprecate_positional_args.<locals>._decorator.<locals>.inner(*args, **kwargs)
111 kwargs.update({name: arg for name, arg in zip_args})
113 return func(*args[:-n_extra_args], **kwargs)
--> 115 return func(*args, **kwargs)
File ~/opt/miniconda3/envs/p12/lib/python3.12/site-packages/xarray/core/dataset.py:5663, in Dataset.unstack(self, dim, fill_value, sparse)
5659 result = result._unstack_full_reindex(
5660 d, stacked_indexes[d], fill_value, sparse
5661 )
5662 else:
-> 5663 result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse)
5664 return result
File ~/opt/miniconda3/envs/p12/lib/python3.12/site-packages/xarray/core/dataset.py:5496, in Dataset._unstack_once(self, dim, index_and_vars, fill_value, sparse)
5493 else:
5494 fill_value_ = fill_value
-> 5496 variables[name] = var._unstack_once(
5497 index=clean_index,
5498 dim=dim,
5499 fill_value=fill_value_,
5500 sparse=sparse,
5501 )
5502 else:
5503 variables[name] = var
File ~/opt/miniconda3/envs/p12/lib/python3.12/site-packages/xarray/core/variable.py:1552, in Variable._unstack_once(self, index, dim, fill_value, sparse)
1547 # Indexer is a list of lists of locations. Each list is the locations
1548 # on the new dimension. This is robust to the data being sparse; in that
1549 # case the destinations will be NaN / zero.
1550 data[(..., *indexer)] = reordered
-> 1552 return self._replace(dims=new_dims, data=data)
File ~/opt/miniconda3/envs/p12/lib/python3.12/site-packages/xarray/core/variable.py:957, in Variable._replace(self, dims, data, attrs, encoding)
955 if encoding is _default:
956 encoding = copy.copy(self._encoding)
--> 957 return type(self)(dims, data, attrs, encoding, fastpath=True)
File ~/opt/miniconda3/envs/p12/lib/python3.12/site-packages/xarray/core/variable.py:2625, in IndexVariable.__init__(self, dims, data, attrs, encoding, fastpath)
2623 super().__init__(dims, data, attrs, encoding, fastpath)
2624 if self.ndim != 1:
-> 2625 raise ValueError(f"{type(self).__name__} objects must be 1-dimensional")
2627 # Unlike in Variable, always eagerly load values into memory
2628 if not isinstance(self._data, PandasIndexingAdapter):
ValueError: IndexVariable objects must be 1-dimensional
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None python: 3.12.0 | packaged by conda-forge | (main, Oct 3 2023, 08:36:57) [Clang 15.0.7 ] python-bits: 64 OS: Darwin OS-release: 23.5.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2
xarray: 2024.5.0 pandas: 2.2.2 numpy: 1.26.4 scipy: 1.13.0 netCDF4: 1.6.5 pydap: None h5netcdf: 1.3.0 h5py: 3.11.0 zarr: None cftime: 1.6.3 nc_time_axis: 1.4.1 iris: None bottleneck: 1.3.8 dask: 2024.5.0 distributed: 2024.5.0 matplotlib: 3.8.4 cartopy: 0.23.0 seaborn: None numbagg: None fsspec: 2024.5.0 cupy: None pint: 0.24.1 sparse: 0.15.1 flox: None numpy_groupies: None setuptools: 69.5.1 pip: 24.0 conda: None pytest: None mypy: None IPython: 8.24.0 sphinx: None