xarray icon indicating copy to clipboard operation
xarray copied to clipboard

`from_dataframe` gets a `cast_extension_array` argument

Open ilan-gold opened this issue 6 months ago • 3 comments

Is your feature request related to a problem?

To resolve the ongoing discussion around extension array casting, an option that has arisen would be to add a cast_extension_array argument (see https://github.com/pydata/xarray/issues/10301#issuecomment-2866569942 as well) to from_dataframe in order to give users "old" behavior (see #10301 for an exmaple where this might help).

Whether this is default true or false is up for debate probably but I think this is a great feature independent of the chaos that has arisen.

Describe the solution you'd like

from_dataframe gets a cast_extension_array argument

Describe alternatives you've considered

The alternative would basically be continuing to fix things as they arise and being totally permissive. But that is what is going to likely happen anyway (i.e., you can set on a Dataset with an extension array type anyway given this feature, no issue).

Additional context

cc @dcherian @keewis

ilan-gold avatar Jun 04 '25 16:06 ilan-gold

I'd personally default to True, as that would give everyone who doesn't opt into it the old behavior, and can warn that anyone who does opt-in that there are still a few wrinkles with this feature.

Another option would be to enumerate all the types that should not be cast (defaulting to "none" or ["interval", "categorical"]), and allow short-hands like None / empty list / "none" for cast everything or "all" for allow everything.

keewis avatar Jun 09 '25 17:06 keewis

@keewis I guess we'll just make it super clear in the release notes, but I have a feeling we will break some people's CI who are roundtripping with extension arrays and have this behavior baked in.

ilan-gold avatar Jun 10 '25 07:06 ilan-gold

yeah, I guess so. I'm assuming that rolling back will protect more people than those who will need to do something. I think what we should do is something like

if Version(xr.__version__) >= Version("2025.06.0"):
    options = {"cast_extension_arrays": True}  # or whatever the value we choose
else:
    options = {}

xr.Dataset.from_dataframe(df, ..., **options)

The only thing that we will need to keep in mind is that we'd also have to figure out how to pass this through df.to_xarray().

keewis avatar Jun 10 '25 14:06 keewis