DataFrame.update silently does nothing when indices are of differing type
Code to reproduce
import numpy as np
df_int = pd.DataFrame(
{'col': ['foo', 'bar', np.nan]},
index=[1,2,3]
)
df_obj = pd.DataFrame(
{'col': [np.nan, np.nan, 'baz']},
index=['1', '2', '3']
)
print(df_int)
print(df_obj)
# >>>
# col
# 1 foo
# 2 bar
# 3 NaN
# col
# 1 NaN
# 2 NaN
# 3 baz
# Note that the indices appear identical, but are actually different dtypes
df_int.update(df_obj)
print(df_int)
# Intended output
# >>>
# col
# 1 foo
# 2 bar
# 3 baz
# Actual output
# >>>
# a
# 1 foo
# 2 bar
# 3 NaN
Problem description
Since update compares values of indices, when two dataframes with differing index dtypes are compared, it is possible that no matches are made when this is not the intended behaviour the user expects, and there is no feedback to the user that this has happened. This is particularly surprising when indices appear to be identical, as highlighted above. A warning should be raised to signal that either:
- tells the user that the indices are not the same type, which may produce some unintended results.
- states that a type comparison is taking place that will never produce any matches.
Related to #4094
states that a type comparison is taking place that will never produce any matches.
This is hard to do in general. In your case, a regular Index can contain objects with any type, including the same type as the Int64Index.
@TomAugspurger : That being said, we shouldn't aligning on indices that are clearly not equal (i.e. string and numeric), so this is still a bug IMO.
The df_int.update(df_obj) call now correctly raises ValueError: Update not allowed when the index on other has no intersection with this dataframe on main. Could use a test (first check to see if one already exists)
Looks like a test already exists: https://github.com/pandas-dev/pandas/blob/25934d6755dda44f192e4893106316bc4dce5382/pandas/tests/frame/methods/test_update.py#L218