vaex icon indicating copy to clipboard operation
vaex copied to clipboard

[BUG-REPORT] `Column.describe_categorical` raises when column names start with numbers

Open honno opened this issue 2 years ago • 0 comments

Weird bug(?) I stumbled upon when using (string) numbers as names for categorical columns, and then trying to use the interchange protocol on it.

>>> df = vaex.from_items(("42", np.asarray([3, 1, 1, 2, 0])))
>>> df = df.categorize("42")
>>> interchange_df = df.__dataframe__()
>>> interchange_col = interchange_df.get_column_by_name("42")
>>> interchange_col.describe_categorical
File .../vaex/dataframe_protocol.py:434, in _VaexColumn.describe_categorical(self)
    416 """
    417 If the dtype is categorical, there are two options:
    418 
   (...)
    431                   None if not a dictionary-style categorical.
    432 """
    433 if not self.dtype[0] == _DtypeKind.CATEGORICAL:
--> 434     raise TypeError("`describe_categorical only works on a column with " "categorical dtype!")
    436 ordered = False
    437 is_dictionary = True
TypeError: `describe_categorical only works on a column with categorical dtype!

This works fine (well besides from #2113) if say the name starts with an alphanumeric

>>> df = vaex.from_items(("a42", np.asarray([3, 1, 1, 2, 0])))
>>> df = df.categorize("a42")
>>> interchange_df = df.__dataframe__()
>>> interchange_col = interchange_df.get_column_by_name("a42")
>>> interchange_col.describe_categorical
(False, True, {0: 0, 1: 1, 2: 2, 3: 3})

Using local build of upstream master

honno avatar Jul 27 '22 09:07 honno