pandas BUG: `numpy.ma.fix_invalid` makes changes in-place in numpy 2.1.0 even with `copy=True`

Pandas version checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> import pandas as pd, numpy as np
>>> pd.__version__; np.__version__
'2.2.2'
'2.1.0'
>>> my_series = pd.Series([1.0, 2.0, np.nan, 0.0, 1.0])
>>> my_series
0    1.0
1    2.0
2    NaN
3    0.0
4    1.0
dtype: float64
>>> np.ma.fix_invalid(my_series)
masked_array(data=[1.0, 2.0, --, 0.0, 1.0],
             mask=[False, False,  True, False, False],
       fill_value=1e+20)
>>> my_series
0    1.000000e+00
1    2.000000e+00
2    1.000000e+20
3    0.000000e+00
4    1.000000e+00
dtype: float64

Issue Description

Copying the description of: https://github.com/numpy/numpy/issues/27253

numpy.ma.fix_invalid behaves differently between NumPy 2.1.0 and NumPy 2.0.0. Specifically, when passing a pandas Series containing a numpy.nan value, numpy.ma.fix_invalid now makes changes in-place, even if the copy argument is set to its default value of True. This issue occurs only with pandas Series, not with NumPy arrays, for example.

Expected Behavior

>>> pd.__version__; np.__version__
'2.2.2'
'2.0.0'
>>> 
>>> my_series = pd.Series([1.0, 2.0, np.nan, 0.0, 1.0])
>>> my_series
0    1.0
1    2.0
2    NaN
3    0.0
4    1.0
dtype: float64
>>> np.ma.fix_invalid(my_series)
masked_array(data=[1.0, 2.0, --, 0.0, 1.0],
             mask=[False, False,  True, False, False],
       fill_value=1e+20)
>>> my_series
0    1.0
1    2.0
2    NaN
3    0.0
4    1.0
dtype: float64

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.11.9.final.0 python-bits : 64 OS : Darwin OS-release : 23.5.0 Version : Darwin Kernel Version 23.5.0: Wed May 1 20:13:18 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6030 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8

pandas : 2.2.2 numpy : 2.1.0 pytz : 2024.1 dateutil : 2.9.0 setuptools : 69.5.1 pip : 24.0 Cython : None pytest : 8.3.1 hypothesis : 6.108.4 sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.4 IPython : 8.26.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2024.6.1 gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 17.0.0 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None

Aug 26 '24 15:08 kounelisagis

Thanks for the report - this seems to be the core of the issue.

my_series = pd.Series([1.0, 2.0, np.nan, 0.0, 1.0])
arr = np.array(my_series, copy=True)
print(np.shares_memory(my_series._values, arr))
# True

Aug 26 '24 21:08 rhshadrach

#57172 looks related.

Aug 26 '24 21:08 rhshadrach

This might be on our side that we still have to implement the copy keyword properly? -> https://github.com/pandas-dev/pandas/issues/57739

Aug 27 '24 17:08 jorisvandenbossche

FWIW, I have the same issue with np.nan_to_num, which would be expected if it's a broader issue.

Sep 25 '24 18:09 kevbutler

@jorisvandenbossche @seberg

are you happy to close this (and the numpy issue) as duplicate/fixed, or are additional tests needed to close.

The two code samples in this discussion are now ok on 2.3.x and main, I assume fixed by #60046.

Nov 15 '24 12:11 simonjayhawkins

Yes, this is indeed fixed by https://github.com/pandas-dev/pandas/pull/60046 and will be in the upcoming pandas 2.3.0

Nov 16 '24 19:11 jorisvandenbossche