vaex icon indicating copy to clipboard operation
vaex copied to clipboard

[BUG-REPORT] Interchange `Column.size` returns 0d arrays as opposed to Python `int`

Open honno opened this issue 2 years ago • 1 comments

Dataframes via the interchange protocol will store return interchange columns which give an erroneous size.

>>> import vaex
>>> df = vaex.from_dict({"a": ["foo", "bar"]})
>>> interchange_df = df.__dataframe__()
>>> col = interchange_df.get_column(0)
>>> col.size
array(2)  # should be a Python integer

This is due to Column.size return the output of a count operation without converting it to a Python integer.

https://github.com/vaexio/vaex/blob/8fada6dd422a82b6f3b50f5c34a46b412536bac4/packages/vaex-core/vaex/dataframe_protocol.py#L318-L323

This prevents interop with https://github.com/pandas-dev/pandas/pull/46141 due to this line assuming (correctly) that size should be a Python integer.

Vaex was built locally from source (upstream master) on Ubuntu 20.04.

honno avatar Jun 22 '22 09:06 honno

Will submit a PR fixing this

honno avatar Aug 04 '22 10:08 honno