xarray icon indicating copy to clipboard operation
xarray copied to clipboard

unstack confusing re `Variable` / `IndexVariable`

Open brianpm opened this issue 7 months ago • 7 comments

What happened?

using unstack on a DataArray generated using the .dt.daysinmonth accessor with time as a multiIndex fails with a ValueError. The mysterious part is that when I build an "identical" DataArray starting from the .data of that same array, it works as expected (see output of example code).

I asked a colleague for help with this, and she said the attached code worked for older versions of xarray, but said it seems to be broken starting at 2023.5.0.

What did you expect to happen?

Expected to get a DataArray (days0) with dimensions ('year', 'month') with sizes (2, 12), which is what I get with the alternate DataArray (called days).

Minimal Complete Verifiable Example

import sys
print(f"python {sys.version}")
import xarray as xr
import numpy as np
import cftime
print(f"numpy: {np.__version__}, xarray: {xr.__version__}, cftime: {cftime.__version__}")
t = np.array([cftime.DatetimeGregorian(1979, 1, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 2, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 3, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 4, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 5, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 6, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 7, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 8, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 9, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 10, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 11, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1979, 12, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 1, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 2, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 3, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 4, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 5, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 6, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 7, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 8, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 9, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 10, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 11, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(1980, 12, 1, 0, 0, 0, 0, has_year_zero=False)])
dss = xr.DataArray(t, dims=['time'], coords={"time":t})

# TWO VERSIONS OF "days":
days0 = dss['time'].dt.daysinmonth

days = xr.DataArray(dss['time'].dt.daysinmonth.data, dims=['time'], coords={'time':dss['time']}, attrs=days0.attrs, name='days_in_month')

print(f"IDENTICAL: {days.identical(days0)}")

year = dss['time'].dt.year.data
month = dss['time'].dt.month.data

# REPEAT SAME STEPS FOR days and days0:
days = days.assign_coords(year=("time", year), month=("time", month))
days = days.set_index(time=['year', 'month'])

days0 = days0.assign_coords(year=("time", year), month=("time", month))
days0 = days0.set_index(time=['year', 'month'])

print(f"IDENTICAL: {days.identical(days0)}")

days = days.unstack('time') # THIS WORKS
print(f"{days.dims = }")
#
days0 = days0.unstack('time') # THIS FAILS
print(f"{days0.dims = }")

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

python 3.12.0 | packaged by conda-forge | (main, Oct  3 2023, 08:36:57) [Clang 15.0.7 ]
numpy: 1.26.4, xarray: 2024.5.0, cftime: 1.6.3
IDENTICAL: True
IDENTICAL: True
days.dims = ('year', 'month')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 55
     53 print(f"{days.dims = }")
     54 #
---> 55 days0 = days0.unstack('time') # THIS FAILS
     56 print(f"{days0.dims = }")

File ~/opt/miniconda3/envs/p12/lib/python3.12/site-packages/xarray/util/deprecation_helpers.py:115, in _deprecate_positional_args.<locals>._decorator.<locals>.inner(*args, **kwargs)
    111     kwargs.update({name: arg for name, arg in zip_args})
    113     return func(*args[:-n_extra_args], **kwargs)
--> 115 return func(*args, **kwargs)

File ~/opt/miniconda3/envs/p12/lib/python3.12/site-packages/xarray/core/dataarray.py:2950, in DataArray.unstack(self, dim, fill_value, sparse)
   2888 @_deprecate_positional_args("v2023.10.0")
   2889 def unstack(
   2890     self,
   (...)
   2894     sparse: bool = False,
   2895 ) -> Self:
   2896     """
   2897     Unstack existing dimensions corresponding to MultiIndexes into
   2898     multiple new dimensions.
   (...)
   2948     DataArray.stack
   2949     """
-> 2950     ds = self._to_temp_dataset().unstack(dim, fill_value=fill_value, sparse=sparse)
   2951     return self._from_temp_dataset(ds)

File ~/opt/miniconda3/envs/p12/lib/python3.12/site-packages/xarray/util/deprecation_helpers.py:115, in _deprecate_positional_args.<locals>._decorator.<locals>.inner(*args, **kwargs)
    111     kwargs.update({name: arg for name, arg in zip_args})
    113     return func(*args[:-n_extra_args], **kwargs)
--> 115 return func(*args, **kwargs)

File ~/opt/miniconda3/envs/p12/lib/python3.12/site-packages/xarray/core/dataset.py:5663, in Dataset.unstack(self, dim, fill_value, sparse)
   5659         result = result._unstack_full_reindex(
   5660             d, stacked_indexes[d], fill_value, sparse
   5661         )
   5662     else:
-> 5663         result = result._unstack_once(d, stacked_indexes[d], fill_value, sparse)
   5664 return result

File ~/opt/miniconda3/envs/p12/lib/python3.12/site-packages/xarray/core/dataset.py:5496, in Dataset._unstack_once(self, dim, index_and_vars, fill_value, sparse)
   5493     else:
   5494         fill_value_ = fill_value
-> 5496     variables[name] = var._unstack_once(
   5497         index=clean_index,
   5498         dim=dim,
   5499         fill_value=fill_value_,
   5500         sparse=sparse,
   5501     )
   5502 else:
   5503     variables[name] = var

File ~/opt/miniconda3/envs/p12/lib/python3.12/site-packages/xarray/core/variable.py:1552, in Variable._unstack_once(self, index, dim, fill_value, sparse)
   1547     # Indexer is a list of lists of locations. Each list is the locations
   1548     # on the new dimension. This is robust to the data being sparse; in that
   1549     # case the destinations will be NaN / zero.
   1550     data[(..., *indexer)] = reordered
-> 1552 return self._replace(dims=new_dims, data=data)

File ~/opt/miniconda3/envs/p12/lib/python3.12/site-packages/xarray/core/variable.py:957, in Variable._replace(self, dims, data, attrs, encoding)
    955 if encoding is _default:
    956     encoding = copy.copy(self._encoding)
--> 957 return type(self)(dims, data, attrs, encoding, fastpath=True)

File ~/opt/miniconda3/envs/p12/lib/python3.12/site-packages/xarray/core/variable.py:2625, in IndexVariable.__init__(self, dims, data, attrs, encoding, fastpath)
   2623 super().__init__(dims, data, attrs, encoding, fastpath)
   2624 if self.ndim != 1:
-> 2625     raise ValueError(f"{type(self).__name__} objects must be 1-dimensional")
   2627 # Unlike in Variable, always eagerly load values into memory
   2628 if not isinstance(self._data, PandasIndexingAdapter):

ValueError: IndexVariable objects must be 1-dimensional

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.12.0 | packaged by conda-forge | (main, Oct 3 2023, 08:36:57) [Clang 15.0.7 ] python-bits: 64 OS: Darwin OS-release: 23.5.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.14.3 libnetcdf: 4.9.2

xarray: 2024.5.0 pandas: 2.2.2 numpy: 1.26.4 scipy: 1.13.0 netCDF4: 1.6.5 pydap: None h5netcdf: 1.3.0 h5py: 3.11.0 zarr: None cftime: 1.6.3 nc_time_axis: 1.4.1 iris: None bottleneck: 1.3.8 dask: 2024.5.0 distributed: 2024.5.0 matplotlib: 3.8.4 cartopy: 0.23.0 seaborn: None numbagg: None fsspec: 2024.5.0 cupy: None pint: 0.24.1 sparse: 0.15.1 flox: None numpy_groupies: None setuptools: 69.5.1 pip: 24.0 conda: None pytest: None mypy: None IPython: 8.24.0 sphinx: None

brianpm avatar Jun 29 '24 01:06 brianpm