dataframe-api
dataframe-api copied to clipboard
RFC document, tooling and other content related to the dataframe API standard
In #10, it's been discussed that it would be convenient if the dataframe API allows method chaining. For example: ```python import pandas (pandas.read_csv('countries.csv') .rename(columns={'name': 'country'}) .assign(area_km2=lambda df: df['area_m2'].astype(float) / 100)...
This topic came up on a [PyArrow issue](https://github.com/apache/arrow/issues/33982#issuecomment-1669278644) by Polars developers working on their native Dataframe Protocol Implementation. To note, in the PyArrow implementation of the protocol we decided to...
The DataFrame Interchange Protocol has a `nan_as_null` keyword in `__dataframe__` that can be specified by the consumer, i.e. the person/library calling this method. The docstring explains its goal: https://github.com/data-apis/dataframe-api/blob/d10a096ccc1612ec4acf8503aff58b3acb4e3738/protocol/dataframe_protocol.py#L400-L403 However,...
I don't know if it's possible, but having a standard way to thread through unit of measures would be great. Ideally you could implement something like [pint-pandas](https://github.com/hgrecco/pint-pandas) but instead as...
Came across this yesterday: https://github.com/microsoft/vscode-jupyter/pull/13951 ```python elif _VSCODE_builtins.hasattr(df, "to_pandas"): df = df.to_pandas().iloc[start:end] ``` This could be improved if they only needed to convert to pandas the part of the data...
Currently, the home page (https://data-apis.org/) points to the "DataFrame API" (https://data-apis.org/dataframe-api/draft/), but this then doesn't mention the interchange protocol (it only points back to the homepage). So for a moment...
The bulk of the dataframe interchange protocol was done in gh-38. There were still a number of TODOs however, and more will likely pop up once we have multiple implementations...
In some cases users like to use Array API functions (for example `where`) on DataFrame objects (in particular Series). Is this something that we would like to support in the...
In pyarrow we differentiate between missing (`null`) values, which we define with a bitmask, and `NaN` float values. From the dataframe interchange protocol specification we have understood that one _can_...
I am currently working on the implementation of the dataframe interchange protocol for PyArrow. After testing the current PyArrow implementation for producing a `__dataframe__` object with Pandas implementation for consuming...