Clarify if interchange dataframes should also have `__dataframe__()`
I couldn't work out if the interchange dataframe (i.e. the dataframe returned from __dataframe__()) should also have a __dataframe__() method, e.g.
>>> import pandas as pd
>>> df = pd.DataFrame() # i.e. the top-level dataframe
>>> interchange_df1 = df.__dataframe__()
>>> interchange_df2 = interchange_df1.__dataframe__()
With the upstream of current adopters, we have a split on whether the interchange dataframe has this method.
| Library | Top-level | Interchange |
|---|---|---|
| pandas | :heavy_check_mark: | :heavy_check_mark: |
| vaex | :heavy_check_mark: | :x: |
| modin | :heavy_check_mark: | :heavy_check_mark: |
| cuDF | :heavy_check_mark: | :x: |
I had assumed that interchange dataframes should have __dataframe__() by virtue of it being a method in the DataFrame API object. I think it makes sense, as then from_dataframe()-like functions only need to check for __dataframe__() to support interchanging both top-level and interchange dataframes of different libraries.
If there is explicit specification somewhere in this regard then please give me a pointer! In any case, it might be worth clarifying in the __dataframe__() docstring where this method should be residing.
If there is explicit specification somewhere in this regard then please give me a pointer! In any case, it might be worth clarifying in the
__dataframe__()docstring where this method should be residing.
I'm fairly sure we discussed this once or twice, and that the answer is yes but that there is an inconsistency between spec and implementations. Here is what the spec says currently:
https://github.com/data-apis/dataframe-api/blob/4f7c1e0c425643d57120c5b73434d992b3e83595/protocol/dataframe_protocol.py#L307-L310
And IIRC the implementations just return self. My recollection is that we agreed to make all the implementations consistent, and then update the spec. @vnlitvinov please correct me if I'm wrong here.
The relevant spec change to return self (or a 'DataFrame' instance in the type annotation) is gh-74, which is about to be merged.
Pandas: https://github.com/pandas-dev/pandas/blob/7214f85/pandas/core/interchange/dataframe.py#L38-L41
Modin:
- Pandas backend: https://github.com/modin-project/modin/blob/cfcdf081e197d35e1fd1526f5585e361793dc2fa/modin/core/dataframe/pandas/interchange/dataframe_protocol/dataframe.py#L82-L85
- Omnisci backend: https://github.com/modin-project/modin/blob/91ac777c17a52df5afa0d846b84bf9dd73c45d14/modin/experimental/core/execution/native/implementations/omnisci_on_native/interchange/dataframe_protocol/dataframe.py#L71-L74
Vaex: I just opened https://github.com/vaexio/vaex/pull/2205
cuDF: I just opened https://github.com/rapidsai/cudf/pull/11692