xarray
xarray copied to clipboard
Cannot export dataset with categorical index in 2025.4.0
What happened?
In 2025.4.0 and on the current master, trying to export to netCDF a dataset created from a dataframe with categorical index raises the error:
TypeError: Cannot interpret 'CategoricalDtype(categories=['C1', 'C2'], ordered=True, categories_dtype=object)' as a data type
What did you expect to happen?
In 2025.3.1 and before, it was possible to export such a dataset (although the categorical index might be lost in the process).
Minimal Complete Verifiable Example
import pandas as pd
import xarray as xr
df = pd.DataFrame([{"ind": "C1", "val": 1.0}, {"ind": "C2", "val": 2.0}]).set_index("ind")
df.index = df.index.astype(pd.CategoricalDtype(categories=["C1", "C2"], ordered=True))
ds = df.to_xarray()
ds.to_netcdf("foo.nc")
MVCE confirmation
- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Anything else we need to know?
Might be related to #10301.
Arguably, the new behavior is better than silently converting to another type. But then, the changelog of 2025.4.0 might need a bit more information on how to update your code for this new behavior.
(Cross-ref: https://github.com/capytaine/capytaine/issues/683)
Environment
xarray: 2025.4.1.dev16+gc8affb3c pandas: 2.2.3 numpy: 2.2.5 scipy: 1.15.2 netCDF4: 1.7.2 pydap: None h5netcdf: None h5py: 3.13.0 zarr: None cftime: 1.6.4.post1 nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.10.0 cartopy: None seaborn: None numbagg: None fsspec: 2025.3.2 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: None pip: 25.0.1 conda: None pytest: 8.3.4 mypy: None IPython: 8.32.0 sphinx: 8.1.3
</details>
Yes this hasn't been built yet. We could use either netcdf enums or the CF flag variable conventions for this. The latter generalizes across array formats so would be good to do that by default I think.
as of #9671, xarray supports extension array indexes as well. So those go into the xarray object untouched and then they are being (attempted to be) written to disk, but it seems that netcdf writing lacks support for them.
Previously, these were just thrown into numpy object dtype containers once they crossed from pandas to xarray, and were then written as fixed sized strings. Quite a departure from the original position, but now we have to deal with handling the original data type.
Use ds.as_numpy() to recover previous behaviour.