pandas BUG: NamedTuples do no match tuples in pandas.Index

Pandas version checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

$ ipython
Python 3.11.8 (main, Mar 19 2024, 17:46:15) [GCC 11.4.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.22.2 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pandas

In [2]: import collections

In [3]: MyNamedTuple = collections.namedtuple("MyNamedTuple", "id sub_id")

In [4]: first = MyNamedTuple('identity','1234')

In [5]: idx = pandas.Index([('identity','1234')])

In [6]: idx
Out[6]: 
MultiIndex([('identity', '1234')],
           )

In [7]: idx2 = idx.to_flat_index()

In [8]: idx2
Out[8]: Index([('identity', '1234')], dtype='object')

In [9]: first in idx
Out[9]: True

In [10]: first in idx2
Out[10]: False

In [11]: first in idx2.to_list()
Out[11]: True

In [12]: first == idx2[0]
Out[12]: True

In [13]: pandas.__version__
Out[13]: '2.2.1'

In [14]: idx.get_loc(first)
Out[14]: 0

In [15]: idx2.get_loc(first)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/.pyenv/versions/3.11.8/envs/test-venv/lib/python3.11/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key)
   3804 try:
-> 3805     return self._engine.get_loc(casted_key)
   3806 except KeyError as err:

File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: MyNamedTuple(id='identity', sub_id='1234')

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[15], line 1
----> 1 idx2.get_loc(first)

File ~/.pyenv/versions/3.11.8/envs/test-venv/lib/python3.11/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3807     if isinstance(casted_key, slice) or (
   3808         isinstance(casted_key, abc.Iterable)
   3809         and any(isinstance(x, slice) for x in casted_key)
   3810     ):
   3811         raise InvalidIndexError(key)
-> 3812     raise KeyError(key) from err
   3813 except TypeError:
   3814     # If we have a listlike key, _check_indexing_error will raise
   3815     #  InvalidIndexError. Otherwise we fall through and re-raise
   3816     #  the TypeError.
   3817     self._check_indexing_error(key)

KeyError: MyNamedTuple(id='identity', sub_id='1234')

In [16]:

Issue Description

Upgraded from pandas 1.2.5 to pandas 1.3.5 and noticed that I was unable to reference columns in a dataframe with column labels that were tuples via a NamedTuple, i.e. KeyError. Grabbed the latest pandas and reduced the issue down to pandas.Index.get_loc - though it works in the case where I leave the Index as a MultiIndex.

Note: I have seen the code work in about 25% of cases, so if you see it succeed please try again

Expected Behavior

NamedTuples should match regular tuples as they do elsewhere in python (as illustrated by the fact that they match when one does idx.to_list())

Installed Versions

In [16]: pandas.show_versions() /home/russellm/.pyenv/versions/3.11.8/envs/test-venv/lib/python3.11/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")

INSTALLED VERSIONS

commit : bdc79c146c2e32f2cab629be240f01658cfb6cc2 python : 3.11.8.final.0 python-bits : 64 OS : Linux OS-release : 6.5.0-21-generic Version : #21~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 9 13:32:52 UTC 2 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 2.2.1 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 65.5.0 pip : 24.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 8.22.2 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None

In [17]:

Mar 19 '24 23:03 Apteryx0

FWIW

In [20]: idx2._engine
Out[20]: <pandas._libs.index.ObjectEngine at 0x7f071ddcab00>

In [21]: idx2._engine.values
Out[21]: array([('identity', '1234')], dtype=object)

In [22]: first in idx2._engine.values
Out[22]: False

In [23]: first == idx2._engine.values[0]
Out[23]: True

In [24]: hash(first)
Out[24]: 5766037510587733218

In [25]: hash(idx2._engine.values[0])
Out[25]: 5766037510587733218

In [26]:

Mar 19 '24 23:03 Apteryx0

Happens on main as well, including the non deterministic success in a minority of attempts.

Mar 19 '24 23:03 dontgoto

Thanks for the report. In general you will find very little support for containers as elements of an index or columns.

Related: https://github.com/pandas-dev/pandas/pull/57004#issuecomment-1906984802

Mar 20 '24 21:03 rhshadrach

Hi, my partner (GitHub: mwanink) and I would like to work on this issue.

Mar 28 '24 21:03 20revsined

take

Mar 28 '24 21:03 20revsined

take

Jul 18 '24 21:07 matiaslindgren