BUG: Series.gt (and other comparison methods) can fail with dtype=object
Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
>>> import pandas as pd
>>>
>>> x = pd.Series([None], dtype=object)
>>> y = pd.Series([0])
# This raises a: "TypeError: '>' not supported between instances of 'NoneType' and 'int'"
>>> x.gt(y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/test/venv/lib/python3.12/site-packages/pandas/core/series.py", line 6300, in gt
return self._flex_method(
^^^^^^^^^^^^^^^^^^
File "/home/test/venv/lib/python3.12/site-packages/pandas/core/series.py", line 6246, in _flex_method
return self._binop(other, op, level=level, fill_value=fill_value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/test/venv/lib/python3.12/site-packages/pandas/core/series.py", line 6195, in _binop
result = func(this_vals, other_vals)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'NoneType' and 'int'
# This runs without error.
>>> x > y
0 False
dtype: bool
# When converted to DataFrames (with object dtypes), .gt runs without error:
>>> x.to_frame().gt(y.to_frame())
0
0 False
# If the series has dtype=float, the comparison runs without error.
>>> x.astype(float).gt(y)
0 False
dtype: bool
Issue Description
When a Series has dtype=object, comparison methods (e.g., .gt) can raise a TypeError: '>' not supported error. No error is encountered when using the > operator, or when calling DataFrame.gt, or when the Series has dtype=float.
Expected Behavior
When the Series has dtype=object, the behavior of Series.gt should be consistent with the > operator and with the DataFrame.gt method.
Installed Versions
INSTALLED VERSIONS
------------------
commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140
python : 3.12.4.final.0
python-bits : 64
OS : Linux
OS-release : 6.10.2-arch1-1
Version : #1 SMP PREEMPT_DYNAMIC Sat, 27 Jul 2024 16:49:55 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.2
numpy : 2.0.1
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 71.1.0
pip : 23.2.1
Cython : None
pytest : 8.3.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.9.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.14.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
I would like to work on this
Thanks for the report - it seems to me comparing None to e.g. integers should raise. My guess is that x > y succeeding is a result of assuming None is an NA value and hence behaves like np.nan (always false for comparisons). Further investigations are welcome!
take
@rhshadrach - Any ideas for a fix? do we raise an error when "<" is used between Series that contains None?
That seems like the correct behavior to me - yes.
Should DataFrame.gt raise an error as well?
Also, should one expect the behavior to be consistent across all values for which pd.isna returns True (e.g., None, np.nan, pd.NA, etc.)? Or does one need to be cognizant of how missing values are represented in each instance?
My above comments are only regarding Python's None when stored in an object-dtype column or Series.
Thanks. I'll just note that the below also currently runs without error. Not sure if that's a situation that needs to be considered as well.
>>> x = pd.Series([None], dtype=object)
>>> x.gt(0)
0 False
dtype: bool
Hi @warwickmm! Are you working on this? If not, I would like to take this up.
I am not.
take