xarray
xarray copied to clipboard
to_base_variable: coerce multiindex data to numpy array
- [x] Closes #8887, and probably supersedes #8809
- [x] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in
whats-new.rst
- ~~New functions/methods are listed in
api.rst
~~
@slevang this should also make work your test case added in #8809. I haven't added it here, instead I added a basic check that should be enough.
I don't really understand why the serialization backends (zarr?) do not seem to work with the PandasMultiIndexingAdapter.__array__()
implementation, which should normally coerce the multi-index levels into numpy arrays as needed. Anyway, I guess that coercing it early like in this PR doesn't hurt and may avoid the confusion of a non-indexed, isolated coordinate variable that still wraps a pandas.MultiIndex.
Thanks @benbovy, this seems good, but still doesn't fix my original issue in #8809. See comment there for more detail.
This consistency check is still broken though, I pushed it to this branch.
import numpy as np
import xarray as xr
# ND DataArray that gets stacked along a multiindex
da = xr.DataArray(np.ones((3, 3)), coords={"dim1": [1, 2, 3], "dim2": [4, 5, 6]})
da = da.stack(feature=["dim1", "dim2"])
# Extract just the stacked coordinates for saving in a dataset
ds = xr.Dataset(data_vars={"feature": da.feature})
xr.testing.assertions._assert_internal_invariants(ds.reset_index(["feature", "dim1", "dim2"]), check_default_indexes=False) # succeeds
xr.testing.assertions._assert_internal_invariants(ds.reset_index(["feature"]), check_default_indexes=False) # fails, but no warning either
Wow it took me some time to figure that out:
ds = xr.Dataset(data_vars={"feature": da.feature})
So it detects the multi-index from da.feature
, then assigns it to the feature
variable, auto-promotes the later to a coordinate and finally auto-creates coordinates and indexes for the multi-index levels. That's a lot happening under the hood! The internal logic for handling this is complicated, very fragile and actually still buggy (in this case Xarray wrongly creates two Xarray indexes for the level coordinates and for the "feature" dimension coordinate respectively, so reset_index
won't work as expected).
This is being addressed / discussed in #8140.