vaex
vaex copied to clipboard
[BUG-REPORT] `Column.describe_categorical` raises when column names start with numbers
Weird bug(?) I stumbled upon when using (string) numbers as names for categorical columns, and then trying to use the interchange protocol on it.
>>> df = vaex.from_items(("42", np.asarray([3, 1, 1, 2, 0])))
>>> df = df.categorize("42")
>>> interchange_df = df.__dataframe__()
>>> interchange_col = interchange_df.get_column_by_name("42")
>>> interchange_col.describe_categorical
File .../vaex/dataframe_protocol.py:434, in _VaexColumn.describe_categorical(self)
416 """
417 If the dtype is categorical, there are two options:
418
(...)
431 None if not a dictionary-style categorical.
432 """
433 if not self.dtype[0] == _DtypeKind.CATEGORICAL:
--> 434 raise TypeError("`describe_categorical only works on a column with " "categorical dtype!")
436 ordered = False
437 is_dictionary = True
TypeError: `describe_categorical only works on a column with categorical dtype!
This works fine (well besides from #2113) if say the name starts with an alphanumeric
>>> df = vaex.from_items(("a42", np.asarray([3, 1, 1, 2, 0])))
>>> df = df.categorize("a42")
>>> interchange_df = df.__dataframe__()
>>> interchange_col = interchange_df.get_column_by_name("a42")
>>> interchange_col.describe_categorical
(False, True, {0: 0, 1: 1, 2: 2, 3: 3})
Using local build of upstream master