BUG: pd.merge fail with numpy.uintc on Windows
Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
This bug is fixed by adding np.uintc to _factorizers in pd.merge. Could someone upload it fixed please?. For me the bug is fixed by adding in line 114 of core/reshape/merge.py file: np.uintc:libhastable.UInt32Factorizer.
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'a':['foo','bar'],'b':np.array([1,2], dtype=np.uintc)})
df2 = pd.DataFrame({'a':['foo','baz'],'b':np.array([3,4], dtype=np.uintc)})
df3=df1.merge(df2, how = 'outer')
print(df3)
Issue Description
Traceback (most recent call last):
File "C:\Users\noname37486\PycharmProjects\pythonProject3\main.py", line 10, in
Installed Versions
INSTALLED VERSIONS
commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.9.6.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22631 machine : AMD64 processor : Intel64 Family 6 Model 165 Stepping 2, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : es_ES.cp1252
pandas : 2.2.2 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 57.0.0 pip : 21.1.2 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None
I have made a pull request on this issue, kindly check
- #58727
xref https://github.com/pandas-dev/pandas/issues/52451#issuecomment-1720767399
From https://github.com/pandas-dev/pandas/issues/60091#issue-2609739687
This issue can be resolved by adding uintc similar to the code here for intc:
https://github.com/pandas-dev/pandas/blob/8d2ca0bf84bcf44a800ac19bdb4ed7ec88c555e2/pandas/core/reshape/merge.py#L124-L126
However, we should be checking np.dtype(np.intc).itemsize when doing so and using this to determine the right dtype to map to: if it is 4 we map to libhashtable.UInt32Factorizer and if it is 8 we map to libhashtable.UInt64Factorizer.
Resolution of this issue should also address the np.intc case with checking itemsize as well.