pandas BUG: ValueError with loc[] = (Regression 2.1.0)

Pandas version checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame(index=[1, 1, 2, 2], data=["1", "1", "2", "2"])
df.loc[df[0].str.len() > 1, 0] = df[0]
df

Issue Description

Given code fails on the third line with exception given below Code executes normally with panda versions <2.1.0

Traceback (most recent call last): File "", line 3, in File "C:\Users\venv\lib\site-packages\pandas\core\indexing.py", line 885, in setitem iloc._setitem_with_indexer(indexer, value, self.name) File "C:\Users\venv\lib\site-packages\pandas\core\indexing.py", line 1888, in _setitem_with_indexer indexer, value = self._maybe_mask_setitem_value(indexer, value) File "C:\Users\venv\lib\site-packages\pandas\core\indexing.py", line 789, in _maybe_mask_setitem_value value = self.obj.iloc._align_series(indexer, value) File "C:\Users\venv\lib\site-packages\pandas\core\indexing.py", line 2340, in _align_series return ser.reindex(new_ix)._values File "C:\Users\venv\lib\site-packages\pandas\core\series.py", line 4982, in reindex return super().reindex( File "C:\Users\venv\lib\site-packages\pandas\core\generic.py", line 5521, in reindex return self._reindex_axes( File "C:\Users\venv\lib\site-packages\pandas\core\generic.py", line 5544, in _reindex_axes new_index, indexer = ax.reindex( File "C:\Users\venv\lib\site-packages\pandas\core\indexes\base.py", line 4433, in reindex raise ValueError("cannot reindex on an axis with duplicate labels") ValueError: cannot reindex on an axis with duplicate labels

Expected Behavior

Code should execute normally with result 0 1 1 1 1 2 2 2 2

(No reindexing should be necessary since no rows are selected with code on line 3.)

Installed Versions

INSTALLED VERSIONS

commit : ba1cccd19da778f0c3a7d6a885685da16a072870 python : 3.9.0.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_Ireland.1252 pandas : 2.1.0 numpy : 1.24.2 pytz : 2023.3 dateutil : 2.8.2 setuptools : 65.5.1 pip : 22.3.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader : None bs4 : None bottleneck : None dataframe-api-compat: None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : 3.1.2 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None

Mar 05 '24 15:03 rxxg

Thanks for the report!

(No reindexing should be necessary since no rows are selected with code on line 3.)

To be sure, it's not the left hand side that is reindexing, it's the right. E.g.

df.loc[df[0].str.len() > 1, 0] = 5

works. I believe we raise anytime the RHS has a duplicate value because the result can be ambiguous, even though it won't necessarily be ambiguous. In general we try to avoid values-dependent behavior. In this case, if it just so happens that in one case the mask on the left is all False you may think the code works, but will then fail as soon as it isn't all False. That can be a bad user experience.

Mar 05 '24 22:03 rhshadrach

Code executes normally with panda versions <2.1.0

Ah, I missed this! Thanks for that detail. We should run a git blame and see where this ended up getting changed.

Mar 05 '24 22:03 rhshadrach

take

Mar 31 '24 17:03 20revsined

take

Jul 27 '24 10:07 matiaslindgren