BUG: `numpy.ma.fix_invalid` makes changes in-place in numpy 2.1.0 even with `copy=True`
Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
>>> import pandas as pd, numpy as np
>>> pd.__version__; np.__version__
'2.2.2'
'2.1.0'
>>> my_series = pd.Series([1.0, 2.0, np.nan, 0.0, 1.0])
>>> my_series
0 1.0
1 2.0
2 NaN
3 0.0
4 1.0
dtype: float64
>>> np.ma.fix_invalid(my_series)
masked_array(data=[1.0, 2.0, --, 0.0, 1.0],
mask=[False, False, True, False, False],
fill_value=1e+20)
>>> my_series
0 1.000000e+00
1 2.000000e+00
2 1.000000e+20
3 0.000000e+00
4 1.000000e+00
dtype: float64
Issue Description
Copying the description of: https://github.com/numpy/numpy/issues/27253
numpy.ma.fix_invalid behaves differently between NumPy 2.1.0 and NumPy 2.0.0. Specifically, when passing a pandas Series containing a numpy.nan value, numpy.ma.fix_invalid now makes changes in-place, even if the copy argument is set to its default value of True. This issue occurs only with pandas Series, not with NumPy arrays, for example.
Expected Behavior
>>> pd.__version__; np.__version__
'2.2.2'
'2.0.0'
>>>
>>> my_series = pd.Series([1.0, 2.0, np.nan, 0.0, 1.0])
>>> my_series
0 1.0
1 2.0
2 NaN
3 0.0
4 1.0
dtype: float64
>>> np.ma.fix_invalid(my_series)
masked_array(data=[1.0, 2.0, --, 0.0, 1.0],
mask=[False, False, True, False, False],
fill_value=1e+20)
>>> my_series
0 1.0
1 2.0
2 NaN
3 0.0
4 1.0
dtype: float64
Installed Versions
INSTALLED VERSIONS
commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.11.9.final.0 python-bits : 64 OS : Darwin OS-release : 23.5.0 Version : Darwin Kernel Version 23.5.0: Wed May 1 20:13:18 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6030 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8
pandas : 2.2.2 numpy : 2.1.0 pytz : 2024.1 dateutil : 2.9.0 setuptools : 69.5.1 pip : 24.0 Cython : None pytest : 8.3.1 hypothesis : 6.108.4 sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.4 IPython : 8.26.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2024.6.1 gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 17.0.0 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None
Thanks for the report - this seems to be the core of the issue.
my_series = pd.Series([1.0, 2.0, np.nan, 0.0, 1.0])
arr = np.array(my_series, copy=True)
print(np.shares_memory(my_series._values, arr))
# True
#57172 looks related.
This might be on our side that we still have to implement the copy keyword properly? -> https://github.com/pandas-dev/pandas/issues/57739
FWIW, I have the same issue with np.nan_to_num, which would be expected if it's a broader issue.
@jorisvandenbossche @seberg
are you happy to close this (and the numpy issue) as duplicate/fixed, or are additional tests needed to close.
The two code samples in this discussion are now ok on 2.3.x and main, I assume fixed by #60046.
Yes, this is indeed fixed by https://github.com/pandas-dev/pandas/pull/60046 and will be in the upcoming pandas 2.3.0