[BUG] Nulls render incorrectly in pandas compatibility mode
Describe the bug In pandas compatibility mode, columns of float type render null entries as "nan" instead of NA.
The value is being produced here. The current rendering seems to have been selected for a specific reason in #19478, perhaps @galipremsagar will remember why. Otherwise we can just open a PR changing that and see what breaks, I'm assuming it'll be one of the pandas tests that we reenabled in that PR and then we can see what we need to do to make the rendering decision more precise.
Steps/Code to reproduce bug
>>> import pandas as pd
>>> import cudf
>>> pd.Series([1, pd.NA])
0 1
1 <NA>
dtype: object
>>> with cudf.option_context("mode.pandas_compatible", True):
... cudf.Series([1, None], dtype="float")
...
0 1.0
1 nan
dtype: float64
>>> with cudf.option_context("mode.pandas_compatible", False):
... cudf.Series([1, None], dtype="float")
...
0 1.0
1 <NA>
dtype: float64
Expected behavior The rendering should be the same as pandas.
After some discussion with Prem, the issue isn't just the rendering but the underlying value, which we can see by converting to pylibcudf
>>> cudf.get_option("mode.pandas_compatible")
True
>>> s = cudf.Series([1, None], dtype="float")
>>> s
0 1.0
1 nan
dtype: float64
>>> plc.unary.is_null(s.to_pylibcudf()[0]).to_arrow()
<pyarrow.lib.BooleanArray object at 0x7f59fae2bca0>
[
false,
true <<<< THIS IS A BUG
]
>>> s = cudf.Series([1, None], dtype="Float64")
>>> s
0 1.0
1 <NA>
dtype: Float64
>>> plc.unary.is_null(s.to_pylibcudf()[0]).to_arrow()
<pyarrow.lib.BooleanArray object at 0x7f59fae2bca0>
[
false,
true
]
In pandas compatibility mode when using non-extension dtypes (e.g. "float" above) nulls should be converted to nans in our internal representation, and that is not happening as shown above.