xarray
xarray copied to clipboard
Currently no way to create a Coordinates object without indexes for 1D variables
What happened?
The workaround described in https://github.com/pydata/xarray/pull/8107#discussion_r1311214263 does not seem to work on main
, meaning that I think there is currently no way to create an xr.Coordinates
object without 1D variables being coerced to indexes. This means there is no way to create a Dataset
object without 1D variables becoming IndexVariables
being coerced to indexes.
What did you expect to happen?
I expected to at least be able to use the workaround described in https://github.com/pydata/xarray/pull/8107#discussion_r1311214263, i.e.
xr.Coordinates({'x': ('x', uarr)}, indexes={})
where uarr
is an un-indexable array-like.
Minimal Complete Verifiable Example
class UnindexableArrayAPI:
...
class UnindexableArray:
"""
Presents like an N-dimensional array but doesn't support changes of any kind,
nor can it be coerced into a np.ndarray or pd.Index.
"""
_shape: tuple[int, ...]
_dtype: np.dtype
def __init__(self, shape: tuple[int, ...], dtype: np.dtype) -> None:
self._shape = shape
self._dtype = dtype
self.__array_namespace__ = UnindexableArrayAPI
@property
def dtype(self) -> np.dtype:
return self._dtype
@property
def shape(self) -> tuple[int, ...]:
return self._shape
@property
def ndim(self) -> int:
return len(self.shape)
@property
def size(self) -> int:
return np.prod(self.shape)
@property
def T(self) -> Self:
raise NotImplementedError()
def __repr__(self) -> str:
return f"UnindexableArray(shape={self.shape}, dtype={self.dtype})"
def _repr_inline_(self, max_width):
"""
Format to a single line with at most max_width characters. Used by xarray.
"""
return self.__repr__()
def __getitem__(self, key, /) -> Self:
"""
Only supports extremely limited indexing.
I only added this method because xarray will apparently attempt to index into its lazy indexing classes even if the operation would be a no-op anyway.
"""
from xarray.core.indexing import BasicIndexer
if isinstance(key, BasicIndexer) and key.tuple == ((slice(None),) * self.ndim):
# no-op
return self
else:
raise NotImplementedError()
def __array__(self) -> np.ndarray:
raise NotImplementedError("UnindexableArrays can't be converted into numpy arrays or pandas Index objects")
uarr = UnindexableArray(shape=(3,), dtype=np.dtype('int32'))
xr.Variable(data=uarr, dims=['x']) # works fine
xr.Coordinates({'x': ('x', uarr)}, indexes={}) # works in xarray v2023.08.0
but in versions after that it triggers the NotImplementedError in __array__
:
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
Cell In[59], line 1
----> 1 xr.Coordinates({'x': ('x', uarr)}, indexes={})
File ~/Documents/Work/Code/xarray/xarray/core/coordinates.py:301, in Coordinates.__init__(self, coords, indexes)
299 variables = {}
300 for name, data in coords.items():
--> 301 var = as_variable(data, name=name)
302 if var.dims == (name,) and indexes is None:
303 index, index_vars = create_default_index_implicit(var, list(coords))
File ~/Documents/Work/Code/xarray/xarray/core/variable.py:159, in as_variable(obj, name)
152 raise TypeError(
153 f"Variable {name!r}: unable to convert object into a variable without an "
154 f"explicit list of dimensions: {obj!r}"
155 )
157 if name is not None and name in obj.dims and obj.ndim == 1:
158 # automatically convert the Variable into an Index
--> 159 obj = obj.to_index_variable()
161 return obj
File ~/Documents/Work/Code/xarray/xarray/core/variable.py:572, in Variable.to_index_variable(self)
570 def to_index_variable(self) -> IndexVariable:
571 """Return this variable as an xarray.IndexVariable"""
--> 572 return IndexVariable(
573 self._dims, self._data, self._attrs, encoding=self._encoding, fastpath=True
574 )
File ~/Documents/Work/Code/xarray/xarray/core/variable.py:2642, in IndexVariable.__init__(self, dims, data, attrs, encoding, fastpath)
2640 # Unlike in Variable, always eagerly load values into memory
2641 if not isinstance(self._data, PandasIndexingAdapter):
-> 2642 self._data = PandasIndexingAdapter(self._data)
File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:1481, in PandasIndexingAdapter.__init__(self, array, dtype)
1478 def __init__(self, array: pd.Index, dtype: DTypeLike = None):
1479 from xarray.core.indexes import safe_cast_to_index
-> 1481 self.array = safe_cast_to_index(array)
1483 if dtype is None:
1484 self._dtype = get_valid_numpy_dtype(array)
File ~/Documents/Work/Code/xarray/xarray/core/indexes.py:469, in safe_cast_to_index(array)
459 emit_user_level_warning(
460 (
461 "`pandas.Index` does not support the `float16` dtype."
(...)
465 category=DeprecationWarning,
466 )
467 kwargs["dtype"] = "float64"
--> 469 index = pd.Index(np.asarray(array), **kwargs)
471 return _maybe_cast_to_cftimeindex(index)
Cell In[55], line 63, in UnindexableArray.__array__(self)
62 def __array__(self) -> np.ndarray:
---> 63 raise NotImplementedError("UnindexableArrays can't be converted into numpy arrays or pandas Index objects")
NotImplementedError: UnindexableArrays can't be converted into numpy arrays or pandas Index objects
MVCE confirmation
- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
No response
Anything else we need to know?
Context is #8699
Environment
Versions described above
Thanks @TomNicholas. There is a possible fix for this implemented in #8124. I've hit some general issues with the approach taken in that PR (described in the top comment) but I'm wondering if we couldn't cherry pick some of the commits there: 26fa16a36ba04d6aaf4e6ad6f4670c9b5be0a9d4 and 7741b2ab0c24960908b5a39e94e9cdcf375d5084. I'm not sure this will just work, though.
As an (ugly) workaround, xarray Variable
objects can be constructed manually and then passed to xarray.Coordinates._construct_direct()
.
Thanks @benbovy!
xarray
Variable
objects can be constructed manually and then passed toxarray.Coordinates._construct_direct()
.
That's useful, thanks.
I'm wondering if we couldn't cherry pick some of the commits there: https://github.com/pydata/xarray/commit/26fa16a36ba04d6aaf4e6ad6f4670c9b5be0a9d4 and https://github.com/pydata/xarray/commit/7741b2ab0c24960908b5a39e94e9cdcf375d5084.
That would be great.
I'm wondering if the change inside xr.merge
on this line might also fix the other problem I'm having with indexes being automatically created - see the final error in this notebook. (I can make a separate issue to track that error if that would be helpful)
I'm wondering if we couldn't cherry pick some of the commits there: https://github.com/pydata/xarray/commit/26fa16a36ba04d6aaf4e6ad6f4670c9b5be0a9d4 and https://github.com/pydata/xarray/commit/7741b2ab0c24960908b5a39e94e9cdcf375d5084.
I'm not sure this will just work, though.
I tried this in #8711 but it did not just work (yet) - see comment there.
I'm wondering if the change inside xr.merge on this line might also fix the other problem I'm having with indexes being automatically created - see the final error in this notebook.
Yes I think it will also fix this one.