dataframe-api icon indicating copy to clipboard operation
dataframe-api copied to clipboard

Clarify if interchange dataframes should also have `__dataframe__()`

Open honno opened this issue 3 years ago • 3 comments

I couldn't work out if the interchange dataframe (i.e. the dataframe returned from __dataframe__()) should also have a __dataframe__() method, e.g.

>>> import pandas as pd
>>> df = pd.DataFrame()  # i.e. the top-level dataframe
>>> interchange_df1 = df.__dataframe__()
>>> interchange_df2 = interchange_df1.__dataframe__()

With the upstream of current adopters, we have a split on whether the interchange dataframe has this method.

Library Top-level Interchange
pandas :heavy_check_mark: :heavy_check_mark:
vaex :heavy_check_mark: :x:
modin :heavy_check_mark: :heavy_check_mark:
cuDF :heavy_check_mark: :x:

I had assumed that interchange dataframes should have __dataframe__() by virtue of it being a method in the DataFrame API object. I think it makes sense, as then from_dataframe()-like functions only need to check for __dataframe__() to support interchanging both top-level and interchange dataframes of different libraries.

If there is explicit specification somewhere in this regard then please give me a pointer! In any case, it might be worth clarifying in the __dataframe__() docstring where this method should be residing.

honno avatar Jul 14 '22 09:07 honno

If there is explicit specification somewhere in this regard then please give me a pointer! In any case, it might be worth clarifying in the __dataframe__() docstring where this method should be residing.

I'm fairly sure we discussed this once or twice, and that the answer is yes but that there is an inconsistency between spec and implementations. Here is what the spec says currently:

https://github.com/data-apis/dataframe-api/blob/4f7c1e0c425643d57120c5b73434d992b3e83595/protocol/dataframe_protocol.py#L307-L310

And IIRC the implementations just return self. My recollection is that we agreed to make all the implementations consistent, and then update the spec. @vnlitvinov please correct me if I'm wrong here.

rgommers avatar Jul 14 '22 15:07 rgommers

The relevant spec change to return self (or a 'DataFrame' instance in the type annotation) is gh-74, which is about to be merged.

rgommers avatar Jul 14 '22 16:07 rgommers

Pandas: https://github.com/pandas-dev/pandas/blob/7214f85/pandas/core/interchange/dataframe.py#L38-L41

Modin:

  • Pandas backend: https://github.com/modin-project/modin/blob/cfcdf081e197d35e1fd1526f5585e361793dc2fa/modin/core/dataframe/pandas/interchange/dataframe_protocol/dataframe.py#L82-L85
  • Omnisci backend: https://github.com/modin-project/modin/blob/91ac777c17a52df5afa0d846b84bf9dd73c45d14/modin/experimental/core/execution/native/implementations/omnisci_on_native/interchange/dataframe_protocol/dataframe.py#L71-L74

Vaex: I just opened https://github.com/vaexio/vaex/pull/2205

cuDF: I just opened https://github.com/rapidsai/cudf/pull/11692

rgommers avatar Sep 13 '22 13:09 rgommers