pandas icon indicating copy to clipboard operation
pandas copied to clipboard

BUG: Boolean Series (actually object) with <NA> values breaks ~ negation and reverts to bit-wise operations

Open RedAero opened this issue 1 year ago • 0 comments

Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

~pd.Series([True, False]) => [False, True]
~pd.Series([True, False, pd.NA]) => [-2, -1, <NA>]

~pd.Series([True, False, pd.NA], dtype=pd.BooleanDtype()) => [False, True, <NA>]

Issue Description

In the example it is of course possible to work around the issue with forcing the dtype, but in the context of actual dataframes coming from real databases, NULLs are still best handled by simply replacing them with empty strings and hoping for the best, because no NULL-like object seems to work well with ordinary string and filter operations.

As a sidenote, if the pd.NA is instead a None, a TypeError is thrown.

#59831 is a related issue.

Expected Behavior

I'd expect the T/F values to flip and the NAs to remain by default, as they do when the dtype is forced.

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.11.6.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19045 machine : AMD64 processor : AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD byteorder : little LC_ALL : None LANG : en LOCALE : English_United Kingdom.1252

pandas : 2.2.2 numpy : 1.26.4 pytz : 2023.3 dateutil : 2.8.2 setuptools : 68.2.2 pip : 24.2 Cython : 3.0.5 pytest : 7.4.2 hypothesis : 6.87.1 sphinx : 6.1.3 blosc : None feather : 0.4.1 xlsxwriter : 3.1.2 lxml.etree : 4.9.3 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.18.1 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.2 bottleneck : 1.3.7 dataframe-api-compat : None fastparquet : 2023.8.0 fsspec : 2023.9.2 gcsfs : None matplotlib : 3.9.0 numba : 0.60.0 numexpr : 2.8.7 odfpy : None openpyxl : 3.1.2 pandas_gbq : None pyarrow : 13.0.0 pyreadstat : None python-calamine : None pyxlsb : 1.0.10 s3fs : None scipy : 1.14.0 sqlalchemy : 2.0.22 tables : 3.9.2 tabulate : 0.9.0 xarray : 2023.11.0 xlrd : 2.0.1 zstandard : 0.22.0 tzdata : 2022.7 qtpy : 2.4.1 pyqt5 : None

RedAero avatar Oct 15 '24 12:10 RedAero