BUG: Adding two series with pd.Series.add can change the dtype in unexpected ways.
Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
idx = pd.MultiIndex.from_arrays(
[["a", "a", "a", "b", "b", "b"], [1, 2, 3, 1, 2, 3],],
names=["foo", "bar"]
)
s = pd.Series(
index=idx,
data=range(6),
name="otto"
) # this has int64 dtype which is what i want
new_s = pd.Series(
index=pd.Index(["a", "b"], name="foo"),
data=[100, -100 ],
name="otto"
) # also contains int64
s.add(new_s, fill_value=0) # also has int64 dtype which is correct since i'm adding ints to ints
another_s = pd.Series(
index=pd.Index(["a", "c"], name="foo"),
data=[100, 50 ],
name="otto"
) # a new series of ints but this time there is a new entry in the index which didn't exist in s
s.add(another_s, fill_value=0) # this time the dtype changed to float64. which is wrong and creates errors downstream for me
Issue Description
When adding together pd.Series with overlapping indices sometimes the dtype changes from int64 to float64 When using s.add(another_s, fill_value=0) To be clear: the numbers are correct and the results are what I expect. Except for the dtype now being float.
Expected Behavior
Using s + another_s produces nan if the indices don't all match exactly which in turn would change the dtype form int to float. However when using add(_, fill_value=0) I'd expect the dtype to remain int64.
Installed Versions
INSTALLED VERSIONS
commit : 2a953cf80b77e4348bf50ed724f8abc0d814d9dd python : 3.11.5.final.0 python-bits : 64 OS : Darwin OS-release : 23.2.0 Version : Darwin Kernel Version 23.2.0: Wed Nov 15 21:55:06 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T6020 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8
pandas : 2.1.3 numpy : 1.26.2 pytz : 2023.3.post1 dateutil : 2.8.2 setuptools : 67.6.1 pip : 23.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.17.2 pandas_datareader : None bs4 : 4.12.2 bottleneck : None dataframe-api-compat: None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.8.2 numba : None numexpr : None odfpy : None openpyxl : 3.1.2 pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.11.4 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None
I checked-in latest changes in main branch and executed your code. I still see data type of variable s as int64.
Perhaps the issue could have been fixed in main.
I checked. dtype is being promoted to float64 instead of given int64 in main branch.
Seems it is intentional as this test also verifies that in the case where we have integer and a None, it will promote the dtype to float64.