pandas icon indicating copy to clipboard operation
pandas copied to clipboard

BUG: Series.gt (and other comparison methods) can fail with dtype=object

Open warwickmm opened this issue 1 year ago • 11 comments

Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> import pandas as pd
>>> 
>>> x = pd.Series([None], dtype=object)
>>> y = pd.Series([0])

# This raises a: "TypeError: '>' not supported between instances of 'NoneType' and 'int'"
>>> x.gt(y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/test/venv/lib/python3.12/site-packages/pandas/core/series.py", line 6300, in gt
    return self._flex_method(
           ^^^^^^^^^^^^^^^^^^
  File "/home/test/venv/lib/python3.12/site-packages/pandas/core/series.py", line 6246, in _flex_method
    return self._binop(other, op, level=level, fill_value=fill_value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/test/venv/lib/python3.12/site-packages/pandas/core/series.py", line 6195, in _binop
    result = func(this_vals, other_vals)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '>' not supported between instances of 'NoneType' and 'int'


# This runs without error.
>>> x > y
0    False
dtype: bool


# When converted to DataFrames (with object dtypes), .gt runs without error:
>>> x.to_frame().gt(y.to_frame())
       0
0  False


# If the series has dtype=float, the comparison runs without error.
>>> x.astype(float).gt(y)
0    False
dtype: bool

Issue Description

When a Series has dtype=object, comparison methods (e.g., .gt) can raise a TypeError: '>' not supported error. No error is encountered when using the > operator, or when calling DataFrame.gt, or when the Series has dtype=float.

Expected Behavior

When the Series has dtype=object, the behavior of Series.gt should be consistent with the > operator and with the DataFrame.gt method.

Installed Versions

INSTALLED VERSIONS
------------------
commit                : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140
python                : 3.12.4.final.0
python-bits           : 64
OS                    : Linux
OS-release            : 6.10.2-arch1-1
Version               : #1 SMP PREEMPT_DYNAMIC Sat, 27 Jul 2024 16:49:55 +0000
machine               : x86_64
processor             : 
byteorder             : little
LC_ALL                : None
LANG                  : en_US.UTF-8
LOCALE                : en_US.UTF-8

pandas                : 2.2.2
numpy                 : 2.0.1
pytz                  : 2024.1
dateutil              : 2.9.0.post0
setuptools            : 71.1.0
pip                   : 23.2.1
Cython                : None
pytest                : 8.3.1
hypothesis            : None
sphinx                : None
blosc                 : None
feather               : None
xlsxwriter            : None
lxml.etree            : None
html5lib              : None
pymysql               : None
psycopg2              : None
jinja2                : None
IPython               : None
pandas_datareader     : None
adbc-driver-postgresql: None
adbc-driver-sqlite    : None
bs4                   : None
bottleneck            : None
dataframe-api-compat  : None
fastparquet           : None
fsspec                : None
gcsfs                 : None
matplotlib            : 3.9.1
numba                 : None
numexpr               : None
odfpy                 : None
openpyxl              : None
pandas_gbq            : None
pyarrow               : None
pyreadstat            : None
python-calamine       : None
pyxlsb                : None
s3fs                  : None
scipy                 : 1.14.0
sqlalchemy            : None
tables                : None
tabulate              : None
xarray                : None
xlrd                  : None
zstandard             : None
tzdata                : 2024.1
qtpy                  : None
pyqt5                 : None

warwickmm avatar Aug 05 '24 16:08 warwickmm

I would like to work on this

Patsnoop avatar Aug 05 '24 16:08 Patsnoop

Thanks for the report - it seems to me comparing None to e.g. integers should raise. My guess is that x > y succeeding is a result of assuming None is an NA value and hence behaves like np.nan (always false for comparisons). Further investigations are welcome!

rhshadrach avatar Aug 05 '24 20:08 rhshadrach

take

KevsterAmp avatar Aug 06 '24 14:08 KevsterAmp

@rhshadrach - Any ideas for a fix? do we raise an error when "<" is used between Series that contains None?

KevsterAmp avatar Aug 07 '24 08:08 KevsterAmp

That seems like the correct behavior to me - yes.

rhshadrach avatar Aug 07 '24 21:08 rhshadrach

Should DataFrame.gt raise an error as well?

warwickmm avatar Aug 07 '24 21:08 warwickmm

Also, should one expect the behavior to be consistent across all values for which pd.isna returns True (e.g., None, np.nan, pd.NA, etc.)? Or does one need to be cognizant of how missing values are represented in each instance?

warwickmm avatar Aug 07 '24 21:08 warwickmm

My above comments are only regarding Python's None when stored in an object-dtype column or Series.

rhshadrach avatar Aug 07 '24 21:08 rhshadrach

Thanks. I'll just note that the below also currently runs without error. Not sure if that's a situation that needs to be considered as well.

>>> x = pd.Series([None], dtype=object)
>>> x.gt(0)
0    False
dtype: bool

warwickmm avatar Aug 07 '24 23:08 warwickmm

Hi @warwickmm! Are you working on this? If not, I would like to take this up.

maushumee avatar Aug 27 '24 21:08 maushumee

I am not.

warwickmm avatar Aug 27 '24 23:08 warwickmm

take

maushumee avatar Aug 28 '24 13:08 maushumee