dataframe-api
dataframe-api copied to clipboard
Duration/timedelta not supported by dataframe interchange protocol?
Looks like timedeltas are currently not supported by the dataframe interchange protocol:
In [1]: pd.api.interchange.from_dataframe(pl.DataFrame({'a': [timedelta(1)]}))
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
Cell In[1], line 1
----> 1 pd.api.interchange.from_dataframe(pl.DataFrame({'a': [timedelta(1)]}))
File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/interchange/from_dataframe.py:71, in from_dataframe(df, allow_copy)
68 if not hasattr(df, "__dataframe__"):
69 raise ValueError("`df` does not support __dataframe__")
---> 71 return _from_dataframe(
72 df.__dataframe__(allow_copy=allow_copy), allow_copy=allow_copy
73 )
File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/interchange/from_dataframe.py:94, in _from_dataframe(df, allow_copy)
92 pandas_dfs = []
93 for chunk in df.get_chunks():
---> 94 pandas_df = protocol_df_chunk_to_pandas(chunk)
95 pandas_dfs.append(pandas_df)
97 if not allow_copy and len(pandas_dfs) > 1:
File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/interchange/from_dataframe.py:150, in protocol_df_chunk_to_pandas(df)
148 columns[name], buf = string_column_to_ndarray(col)
149 elif dtype == DtypeKind.DATETIME:
--> 150 columns[name], buf = datetime_column_to_ndarray(col)
151 else:
152 raise NotImplementedError(f"Data type {dtype} not handled yet")
File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/interchange/from_dataframe.py:395, in datetime_column_to_ndarray(col)
381 # Consider dtype being `uint` to get number of units passed since the 01.01.1970
383 data = buffer_to_ndarray(
384 dbuf,
385 (
(...)
392 length=col.size(),
393 )
--> 395 data = parse_datetime_format_str(format_str, data) # type: ignore[assignment]
396 data = set_nulls(data, col, buffers["validity"])
397 return data, buffers
File ~/tmp/.venv/lib/python3.10/site-packages/pandas/core/interchange/from_dataframe.py:360, in parse_datetime_format_str(format_str, data)
357 raise NotImplementedError(f"Date unit is not supported: {unit}")
358 return data
--> 360 raise NotImplementedError(f"DateTime kind is not supported: {format_str}")
NotImplementedError: DateTime kind is not supported: tDu
Should they be?
I'm +1 in supporting them
At least between pandas and pyarrow there is some nuance to what these represent. Pandas solely has the timedelta type, but pyarrow has duration (for second and higher precision) and an interval type (for calendar-based shifting).