pandas BUG: CONTAINS_OP run on pd.NA results in pd.NAType.__bool_

Pandas version checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

pd.NA in [1,2,3]

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "missing.pyx", line 392, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

Issue Description

checking for pd.NA existence in a list results in TypeError: boolean value of NA is ambiguous.
Why is performing in operation calls __bool__ method of the pd.NAType class?

Seems a bit similar to the issue regarding incorrect implementation of some operators: https://github.com/pandas-dev/pandas/issues/49828

Expected Behavior

Checking for existence of pd.NA type in any container should correctly return either True or False

Installed Versions

INSTALLED VERSIONS

commit : bdc79c146c2e32f2cab629be240f01658cfb6cc2 python : 3.10.13.final.0 python-bits : 64 OS : Darwin OS-release : 23.2.0 Version : Darwin Kernel Version 23.2.0: Wed Nov 15 21:55:06 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6020 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8

pandas : 2.2.1 numpy : 1.26.3 pytz : 2024.1 dateutil : 2.8.2 setuptools : 68.2.2 pip : 23.3.1 Cython : None pytest : 8.0.0 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.12.0 sqlalchemy : None tables : None tabulate : 0.9.0 xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None

Mar 25 '24 00:03 filip-komarzyniec

Thanks for the report - this is a consequence of having comparisons return pd.NA:

print(pd.NA == 1)
# <NA>

When Python checks "is pd.NA == 1", the result is NA, which Python then evaluates the truthiness of this result, giving you the TypeError as reported. As long as we are returning pd.NA on comparisons, I do not believe anything can be done here.

cc @jorisvandenbossche @phofl

Mar 25 '24 01:03 rhshadrach

We intend to change this to return false (discussed in Basel), should probably get this into 3.0

Mar 25 '24 01:03 phofl

take

Mar 29 '24 19:03 20revsined

We intend to change this to return false (discussed in Basel), should probably get this into 3.0

@phofl Would this change only apply for boolean ops or do you anticipating changing the behavior of numerical ops like 1 + pd.NA as well?

Apr 06 '24 16:04 asishm

not it's only

bool(pd.NA) that we want to change.

@20revsined this is probably not a good issue for a beginner in pandas

Apr 06 '24 19:04 phofl

I don't know if my issue is related to this, please remove my comment if not!

I have a function which gives me the following output (pd df):

timestamp	duration	trial_type	blink	message
9199380	<NA>	NaN	<NA>	RECORD_START
9199345	392	fixation	0	NaN
etc...

column dtypes are: timestamp Int64 duration Int64 trial_type object blink Int64 message object dtype: object

To be precise: timestamp and duration hold numerics plus nans, trial_type holds strings plus nans, blink holds numerics (0 and 1) plus nans, and message hold strings plus nans.

Now I wrote a unit test to test the output for the first row:

@pytest.mark.parametrize(     
"folder, expected",     
[("emg", [9199380, pd.NA, np.nan, pd.NA, "RECORD_START"])]
# + *other folders, removed for simplicity*)

def test_physioevents_value(folder, expected, eyelink_test_data_dir):
    input_dir = eyelink_test_data_dir / folder
    asc_file = asc_test_files(input_dir=input_dir, suffix="*_events")[0]
    events = _load_asc_file(asc_file)
    events_after_start = _df_events_after_start(events)
    physioevents_reordered = _df_physioevents(events_after_start)
    physioevents_eye1 = _physioevents_eye1(physioevents_reordered)
    assert physioevents_eye1.iloc[0].tolist() == expected

And the list obviously looks like this: [9199380, <NA>, nan, <NA>, 'RECORD_START']

I get the following error when running the test:

E AssertionError: assert [9199380, <NA>...CORD_START'] == [9199380, <NA>...CORD_START'] E
E (pytest_assertion plugin: representation of details failed: missing.pyx:392: TypeError: boolean value of NA is ambiguous. E Probably an object has a faulty repr.)

tests/test_edf2bids.py:670: AssertionError

So I guess I cannot use pd.NA to check if the value in that field is <NA>. However, I also cannot check it using "<NA>", i.e. encoding it as a string.

How I can check if pd.NAs s in the dataframe exist?

I tried changing the dtypes so that every column has the dtype 'object'. However, that's not really what I want.

Jun 19 '24 10:06 julia-pfarr

While somewhat related, this:

How I can check if pd.NAs s in the dataframe exist?

is more of a usage question. Please try asking on StackOverflow first - if you don't get your question resolved in a few days, open a new issue here and link to your SO post. We do this as otherwise we fear our issue tracker would be flooded with usage questions.

Jun 19 '24 20:06 rhshadrach

Great, thank you for your reply! I already asked on SO a couple of days ago. I'll wait a bit more and then do as you asked if I don't get it resolved otherwise :-)

Jun 20 '24 09:06 julia-pfarr

pandas
pandas copied to clipboard

BUG: CONTAINS_OP run on pd.NA results in pd.NAType.bool call

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

pandas pandas copied to clipboard

BUG: CONTAINS_OP run on pd.NA results in pd.NAType.__bool__ call

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

pandas
pandas copied to clipboard

BUG: CONTAINS_OP run on pd.NA results in pd.NAType.bool call