xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Dataset Index not included as DataFrame column in `._to_dataframe()` when name different from dimension name

Open stijnvanhoey opened this issue 2 months ago • 4 comments

What happened?

The .to_dataframe function describes in the documentation "Other coordinates are included as columns in the DataFrame.".

When applying the function on a Dataset that contains an index that is not the same 'name' as the corresponding dimension, the coordinate is not included in the resulting Pandas DataFrame. E.g.

import xarray as xr
import pandas as pd
import numpy as np

ds_temp = xr.Dataset(data_vars=dict(temp=(["time", "pos"], np.array([[5, 10, 15, 20, 25]]))), coords=dict(pf=("pos", [1., 2., 4.2, 8., 10.]), time=("time", [pd.to_datetime("2025-01-01")]))).set_xindex("pf")

The example Dataset looks like

<xarray.Dataset> Size: 88B
Dimensions:  (time: 1, pos: 5)
Coordinates:
  * time     (time) datetime64[ns] 8B 2025-01-01
  * pf       (pos) float64 40B 1.0 2.0 4.2 8.0 10.0
Dimensions without coordinates: pos
Data variables:
    temp     (time, pos) int64 40B 5 10 15 20 25

Converting the Dataset to a Pandas DataFrame:

ds_temp.to_dataframe()

The resulting DataFrame is missing the pf coordinate in the returned DataFrame:

                temp
time       pos      
2025-01-01 0       5
           1      10
           2      15
           3      20
           4      25

Dropping the index and applying to_dataframe does actually include the respective coords in the DataFrame:

>>> ds_temp.drop_indexes("pf").to_dataframe()
                temp    pf
time       pos            
2025-01-01 0       5   1.0
           1      10   2.0
           2      15   4.2
           3      20   8.0
           4      25  10.0

This behavior changed in between recent release as in version 2025.1.2 the column was included. I assume this change results from the support for ExtensionArray.

What did you expect to happen?

An index that has not the same name as the dimension is also included in the resulting DataFrame, in the case of the example having pf in the final DataFrame.

Minimal Complete Verifiable Example

import xarray as xr
import pandas as pd
import numpy as np
xr.show_versions()

ds_temp = xr.Dataset(data_vars=dict(temp=(["time", "pos"], np.array([[5, 10, 15, 20, 25]]))), coords=dict(pf=("pos", [1., 2., 4.2, 8., 10.]), time=("time", [pd.to_datetime("2025-01-01")]))).set_xindex("pf")
df = ds_temp.to_dataframe()
assert "pf" in df.columns

Steps to reproduce

The resulting DataFrame lacks the pf as a column

MVCE confirmation

  • [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [x] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [x] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output


Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.11.2 (main, Nov 30 2024, 21:22:50) [GCC 12.2.0] python-bits: 64 OS: Linux OS-release: 6.1.0-37-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.4-development xarray: 2025.10.1 pandas: 2.3.3 numpy: 2.2.3 scipy: 1.15.2 netCDF4: 1.7.2 pydap: None h5netcdf: None h5py: None zarr: 3.0.4 cftime: 1.6.4.post1 nc_time_axis: None iris: None bottleneck: None dask: 2025.2.0 distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2025.2.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 75.8.0 pip: 25.0.1 conda: None pytest: 8.3.5 mypy: 1.15.0 IPython: 9.0.1 sphinx: 8.2.3

stijnvanhoey avatar Oct 14 '25 13:10 stijnvanhoey

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

welcome[bot] avatar Oct 14 '25 13:10 welcome[bot]

Hitting the same issue. It works with xarray=2025.9.0 but not xarray=2025.10.1.

ziw-liu avatar Oct 14 '25 21:10 ziw-liu

Hitting the same issue. It works with xarray=2025.9.0 but not xarray=2025.10.1.

Tested all versions in between, and the breaking change was introduced in 2025.9.1, and the subsequent '(actually) fixing breaking change' patches didn't fix this one.

ziw-liu avatar Oct 14 '25 21:10 ziw-liu

Same issue here. I think it got introduced by the change from dims to xindexes in https://github.com/pydata/xarray/commit/2b947e94971e3fe82e6b73610c8d797e833ea567

JJFlorian avatar Oct 20 '25 15:10 JJFlorian