modin
modin copied to clipboard
Interchange `Column.null_count` is a NumPy scalar, not a builtin `int`
A PandasProtocolColumn
returns a null_count
as a 0d integer array (specifically a NumPy scalar), as opposed to int
as specified in the interchange protocol.
>>> from modin import pandas as mpd
>>> df = pd.DataFrame({"foo": [42]})
>>> interchange_df = df.__dataframe__()
>>> interchange_col = interchange_df.get_column_by_name("foo")
>>> interchange_col.null_count
0
>>> type(interchange_col.null_count)
numpy.int64 # should be Python's int
This seems to be because the null_count
implementation uses DataFrame.squeeze()
, which returns a NumPy scalar rather than an int
.
https://github.com/modin-project/modin/blob/9b33451648a3192e93c46ac6961627ed2858c7fd/modin/core/dataframe/pandas/exchange/dataframe_protocol/column.py#L222-L245
Related https://github.com/pandas-dev/pandas/issues/47789
@honno thank you for reporting the issue. I can reproduce it with your code at 5af9832d7fad3d17f05d63908bc377e61542d953.