vaex
vaex copied to clipboard
[BUG-REPORT] `describe_categorical` in interchange columns is a tuple, not a dict
In the interchange protocol, describe_categorical
should return a dict (mind the spec's API type annotation is faulty), but Vaex returns a tuple
https://github.com/vaexio/vaex/blob/35c250d585f889272b8ef1096de6fa5462816f52/packages/vaex-core/vaex/dataframe_protocol.py#L443
This prevents interchanging dataframes with categorical columns, e.g. with https://github.com/pandas-dev/pandas/pull/46141
>>> import numpy as np
>>> import vaex
>>> df = vaex.from_items(("foo", np.asarray([4, 2, 1, 3, 3], dtype="int8")))
>>> df = df.categorize("foo")
>>> from pandas.api.exchange import from_dataframe
>>> from_dataframe(df)
.../pandas/core/exchange/from_dataframe.py:184, in categorical_column_to_series(col)
169 """
170 Convert a column holding categorical data to a pandas Series.
171
(...)
180 that keeps the memory alive.
181 """
182 categorical = col.describe_categorical
--> 184 if not categorical["is_dictionary"]:
185 raise NotImplementedError("Non-dictionary categoricals not supported yet")
187 mapping = categorical["mapping"]
TypeError: tuple indices must be integers or slices, not str
I'll submit a PR for this