xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Potential regression in Dataset.from_dataframe() not preserving timezone

Open Aloqeely opened this issue 1 year ago • 6 comments

What happened?

Converting pandas DataFrame that has a datetime column with timezone to an xarray dataset does not preserve the timezone, this only breaks in version 2024.5

What did you expect to happen?

I would expect the timezone info to be preserved, as it was the case before.

Minimal Complete Verifiable Example

import pandas as pd
import xarray as xr

df1 = pd.DataFrame(
    {"A": pd.date_range("20130101", periods=4, tz="US/Eastern"), "B": [1, 2, 3, 4]}
)
dataset = xr.Dataset.from_dataframe(df1)
df2 = dataset.to_dataframe()

print(df1.dtypes, dataset.dtypes, df2.dtypes, sep="\n\n")

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

# On xarary 2024.5.0:
A    datetime64[ns, US/Eastern]
B                         int64
dtype: object

Frozen({'A': dtype('<M8[ns]'), 'B': dtype('int64')})

A    datetime64[ns]
B             int64
dtype: object

# ---------------------------
#  On previous versions:

A    datetime64[ns, US/Eastern]
B                         int64
dtype: object

Frozen({'A': dtype('O'), 'B': dtype('int64')})

A    datetime64[ns, US/Eastern]
B                         int64
dtype: object

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.12.1 (tags/v3.12.1:2305ca5, Dec 7 2023, 22:03:25) [MSC v.1937 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD byteorder: little LC_ALL: None LANG: None LOCALE: ('English_United States', '1252') libhdf5: 1.14.2 libnetcdf: None

xarray: 2024.5.0 pandas: 2.2.2 numpy: 1.26.4 scipy: 1.12.0 netCDF4: None pydap: None h5netcdf: None h5py: 3.10.0 zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.8.3 cartopy: None seaborn: None numbagg: None fsspec: 2024.3.1 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 69.2.0 pip: 24.0 conda: None pytest: None mypy: None IPython: 8.22.2 sphinx: 7.2.6

Aloqeely avatar May 14 '24 04:05 Aloqeely

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

welcome[bot] avatar May 14 '24 04:05 welcome[bot]

@ilan-gold are you able to take a look here please? I suspect it's related to extension array stuff

dcherian avatar May 22 '24 14:05 dcherian

Is dtype('O') from previous versions correct though?

ilan-gold avatar May 22 '24 15:05 ilan-gold

Ah, ok, I see, previously this was an array of TimeStamp objects and now is being converted in a numpy array with a "proper" datatype

ilan-gold avatar May 22 '24 15:05 ilan-gold

It's possible the previous behaviour was unintentional and this one is more "correct"/consistent ... Some exploration and reporting would be very helpful.

dcherian avatar May 22 '24 15:05 dcherian

Ok, so the problem is that DateTime is an extension array dtype: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.api.types.is_extension_array_dtype.html

I will look into properly preserving the dtype then, although I suspect there is something else going on regarding datetimes (or the testing is not specific enough to cover this case)

ilan-gold avatar May 22 '24 15:05 ilan-gold