xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Currently no way to create a Coordinates object without indexes for 1D variables

Open TomNicholas opened this issue 5 months ago • 4 comments

What happened?

The workaround described in https://github.com/pydata/xarray/pull/8107#discussion_r1311214263 does not seem to work on main, meaning that I think there is currently no way to create an xr.Coordinates object without 1D variables being coerced to indexes. This means there is no way to create a Dataset object without 1D variables becoming IndexVariables being coerced to indexes.

What did you expect to happen?

I expected to at least be able to use the workaround described in https://github.com/pydata/xarray/pull/8107#discussion_r1311214263, i.e.

xr.Coordinates({'x': ('x', uarr)}, indexes={})

where uarr is an un-indexable array-like.

Minimal Complete Verifiable Example

class UnindexableArrayAPI:
    ...


class UnindexableArray:
    """
    Presents like an N-dimensional array but doesn't support changes of any kind, 
    nor can it be coerced into a np.ndarray or pd.Index.
    """
    
    _shape: tuple[int, ...]
    _dtype: np.dtype
    
    def __init__(self, shape: tuple[int, ...], dtype: np.dtype) -> None:
        self._shape = shape
        self._dtype = dtype
        self.__array_namespace__ = UnindexableArrayAPI

    @property
    def dtype(self) -> np.dtype:
        return self._dtype
    
    @property
    def shape(self) -> tuple[int, ...]:
        return self._shape
    
    @property
    def ndim(self) -> int:
        return len(self.shape)

    @property
    def size(self) -> int:
        return np.prod(self.shape)

    @property
    def T(self) -> Self:
        raise NotImplementedError()

    def __repr__(self) -> str:
        return f"UnindexableArray(shape={self.shape}, dtype={self.dtype})"

    def _repr_inline_(self, max_width):
        """
        Format to a single line with at most max_width characters. Used by xarray.
        """
        return self.__repr__()

    def __getitem__(self, key, /) -> Self:
        """
        Only supports extremely limited indexing.
        
        I only added this method because xarray will apparently attempt to index into its lazy indexing classes even if the operation would be a no-op anyway.
        """
        from xarray.core.indexing import BasicIndexer
        
        if isinstance(key, BasicIndexer) and key.tuple == ((slice(None),) * self.ndim):
            # no-op
            return self
        else:
            raise NotImplementedError()

    def __array__(self) -> np.ndarray:
        raise NotImplementedError("UnindexableArrays can't be converted into numpy arrays or pandas Index objects")
uarr = UnindexableArray(shape=(3,), dtype=np.dtype('int32'))

xr.Variable(data=uarr, dims=['x'])  # works fine

xr.Coordinates({'x': ('x', uarr)}, indexes={})  # works in xarray v2023.08.0

but in versions after that it triggers the NotImplementedError in __array__:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[59], line 1
----> 1 xr.Coordinates({'x': ('x', uarr)}, indexes={})

File ~/Documents/Work/Code/xarray/xarray/core/coordinates.py:301, in Coordinates.__init__(self, coords, indexes)
    299 variables = {}
    300 for name, data in coords.items():
--> 301     var = as_variable(data, name=name)
    302     if var.dims == (name,) and indexes is None:
    303         index, index_vars = create_default_index_implicit(var, list(coords))

File ~/Documents/Work/Code/xarray/xarray/core/variable.py:159, in as_variable(obj, name)
    152     raise TypeError(
    153         f"Variable {name!r}: unable to convert object into a variable without an "
    154         f"explicit list of dimensions: {obj!r}"
    155     )
    157 if name is not None and name in obj.dims and obj.ndim == 1:
    158     # automatically convert the Variable into an Index
--> 159     obj = obj.to_index_variable()
    161 return obj

File ~/Documents/Work/Code/xarray/xarray/core/variable.py:572, in Variable.to_index_variable(self)
    570 def to_index_variable(self) -> IndexVariable:
    571     """Return this variable as an xarray.IndexVariable"""
--> 572     return IndexVariable(
    573         self._dims, self._data, self._attrs, encoding=self._encoding, fastpath=True
    574     )

File ~/Documents/Work/Code/xarray/xarray/core/variable.py:2642, in IndexVariable.__init__(self, dims, data, attrs, encoding, fastpath)
   2640 # Unlike in Variable, always eagerly load values into memory
   2641 if not isinstance(self._data, PandasIndexingAdapter):
-> 2642     self._data = PandasIndexingAdapter(self._data)

File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:1481, in PandasIndexingAdapter.__init__(self, array, dtype)
   1478 def __init__(self, array: pd.Index, dtype: DTypeLike = None):
   1479     from xarray.core.indexes import safe_cast_to_index
-> 1481     self.array = safe_cast_to_index(array)
   1483     if dtype is None:
   1484         self._dtype = get_valid_numpy_dtype(array)

File ~/Documents/Work/Code/xarray/xarray/core/indexes.py:469, in safe_cast_to_index(array)
    459             emit_user_level_warning(
    460                 (
    461                     "`pandas.Index` does not support the `float16` dtype."
   (...)
    465                 category=DeprecationWarning,
    466             )
    467             kwargs["dtype"] = "float64"
--> 469     index = pd.Index(np.asarray(array), **kwargs)
    471 return _maybe_cast_to_cftimeindex(index)

Cell In[55], line 63, in UnindexableArray.__array__(self)
     62 def __array__(self) -> np.ndarray:
---> 63     raise NotImplementedError("UnindexableArrays can't be converted into numpy arrays or pandas Index objects")

NotImplementedError: UnindexableArrays can't be converted into numpy arrays or pandas Index objects

MVCE confirmation

  • [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [x] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

Context is #8699

Environment

Versions described above

TomNicholas avatar Feb 04 '24 18:02 TomNicholas

Thanks @TomNicholas. There is a possible fix for this implemented in #8124. I've hit some general issues with the approach taken in that PR (described in the top comment) but I'm wondering if we couldn't cherry pick some of the commits there: 26fa16a36ba04d6aaf4e6ad6f4670c9b5be0a9d4 and 7741b2ab0c24960908b5a39e94e9cdcf375d5084. I'm not sure this will just work, though.

As an (ugly) workaround, xarray Variable objects can be constructed manually and then passed to xarray.Coordinates._construct_direct().

benbovy avatar Feb 05 '24 12:02 benbovy

Thanks @benbovy!

xarray Variable objects can be constructed manually and then passed to xarray.Coordinates._construct_direct().

That's useful, thanks.

I'm wondering if we couldn't cherry pick some of the commits there: https://github.com/pydata/xarray/commit/26fa16a36ba04d6aaf4e6ad6f4670c9b5be0a9d4 and https://github.com/pydata/xarray/commit/7741b2ab0c24960908b5a39e94e9cdcf375d5084.

That would be great.

I'm wondering if the change inside xr.merge on this line might also fix the other problem I'm having with indexes being automatically created - see the final error in this notebook. (I can make a separate issue to track that error if that would be helpful)

TomNicholas avatar Feb 05 '24 15:02 TomNicholas

I'm wondering if we couldn't cherry pick some of the commits there: https://github.com/pydata/xarray/commit/26fa16a36ba04d6aaf4e6ad6f4670c9b5be0a9d4 and https://github.com/pydata/xarray/commit/7741b2ab0c24960908b5a39e94e9cdcf375d5084.

I'm not sure this will just work, though.

I tried this in #8711 but it did not just work (yet) - see comment there.

TomNicholas avatar Feb 05 '24 22:02 TomNicholas

I'm wondering if the change inside xr.merge on this line might also fix the other problem I'm having with indexes being automatically created - see the final error in this notebook.

Yes I think it will also fix this one.

benbovy avatar Feb 06 '24 12:02 benbovy