pandas icon indicating copy to clipboard operation
pandas copied to clipboard

BUG: Identity checking NA in `map` is incorrect

Open mroeschke opened this issue 1 year ago • 4 comments

In [2]: pd.Series([pd.NA], dtype="Int64").map(lambda x: 1 if x is pd.NA else 2)
Out[2]: 
0    2
dtype: int64

In pandas 2.1

In [2]: pd.Series([pd.NA], dtype="Int64").map(lambda x: 1 if x is pd.NA  else 2)
Out[2]: 
0    1

This is probably because we call to_numpy before going through map_array

mroeschke avatar Feb 13 '24 00:02 mroeschke

I hit the same issue in 2.2.0, based on https://github.com/pandas-dev/pandas/issues/56606#issuecomment-1871319732, it was mentioned this was the expected behavior going forward. Is this no longer the case?

rohanjain101 avatar Feb 13 '24 20:02 rohanjain101

Ah thanks @rohanjain101, I didn't realized you opened https://github.com/pandas-dev/pandas/issues/56606

I would say in an ideal world pd.NA still shouldn't get coerced to np.nan when evaluating a UDF (and without going through object)

mroeschke avatar Feb 15 '24 23:02 mroeschke

take

droussea2001 avatar Apr 21 '24 12:04 droussea2001

Hi @mroeschke : for information I created a PR (https://github.com/pandas-dev/pandas/pull/58392)

The idea is just to avoid that pd.NA value are converted to np.nan by calling to_numpy: pd.NA values stay pd.NA values after a map operation

That's why test_map and test_map_na_action_ignore were modified in this way (we expect in this modified tests to keep pd.NA after a map)

Would it be acceptable to manage this problem in this way ?

droussea2001 avatar May 13 '24 08:05 droussea2001