xarray
xarray copied to clipboard
`from_dataframe` gets a `cast_extension_array` argument
Is your feature request related to a problem?
To resolve the ongoing discussion around extension array casting, an option that has arisen would be to add a cast_extension_array argument (see https://github.com/pydata/xarray/issues/10301#issuecomment-2866569942 as well) to from_dataframe in order to give users "old" behavior (see #10301 for an exmaple where this might help).
Whether this is default true or false is up for debate probably but I think this is a great feature independent of the chaos that has arisen.
Describe the solution you'd like
from_dataframe gets a cast_extension_array argument
Describe alternatives you've considered
The alternative would basically be continuing to fix things as they arise and being totally permissive. But that is what is going to likely happen anyway (i.e., you can set on a Dataset with an extension array type anyway given this feature, no issue).
Additional context
cc @dcherian @keewis
I'd personally default to True, as that would give everyone who doesn't opt into it the old behavior, and can warn that anyone who does opt-in that there are still a few wrinkles with this feature.
Another option would be to enumerate all the types that should not be cast (defaulting to "none" or ["interval", "categorical"]), and allow short-hands like None / empty list / "none" for cast everything or "all" for allow everything.
@keewis I guess we'll just make it super clear in the release notes, but I have a feeling we will break some people's CI who are roundtripping with extension arrays and have this behavior baked in.
yeah, I guess so. I'm assuming that rolling back will protect more people than those who will need to do something. I think what we should do is something like
if Version(xr.__version__) >= Version("2025.06.0"):
options = {"cast_extension_arrays": True} # or whatever the value we choose
else:
options = {}
xr.Dataset.from_dataframe(df, ..., **options)
The only thing that we will need to keep in mind is that we'd also have to figure out how to pass this through df.to_xarray().