vaex
vaex copied to clipboard
[BUG-REPORT] Interchange `Column.dtype` returns format strings in NumPy-style, instead of Arrow-style
In the interchange protocol, Column.dtype
should return an Arrow-style format string, but instead a NumPy-styled one is returned
>>> df = vaex.from_items(("foo", np.asarray([0, 1, 2], dtype="int64")))
>>> interchange_df = df.__dataframe__()
>>> interchange_col = interchange_df.get_column_by_name("foo")
>>> interchange_col.dtype
(<_DtypeKind.INT: 0>, 8, '<i8', '|')
This happens with Arrow-backend columns too
>>> table = pa.Table.from_pydict({"foo": [0, 1, 2]})
>>> df = vaex.from_arrow_table(table)
>>> interchange_df = df.__dataframe__()
>>> interchange_col = interchange_df.get_column_by_name("foo")
>>> interchange_col.dtype
(<_DtypeKind.INT: 0>, 64, '<i8', '=')
It looks like currently the .str
attribute of the equivalent NumPy dtype objects is returned as-is
https://github.com/vaexio/vaex/blob/35c250d585f889272b8ef1096de6fa5462816f52/packages/vaex-core/vaex/dataframe_protocol.py#L407-L410