BUG: masked std and median on unmasked array result in invalid masked array
Describe the issue:
Since version 1.24, the code example below results in a masked array where the data array and the mask array don't have the same shape
Reproduce the code example:
import numpy as np
print(np.__version__)
rng = np.random.default_rng(0)
data = rng.normal(size=(2, 101))
data[:, 2] = np.nan
std = np.ma.std(data, axis=1)
median = np.ma.median(data, axis=1)
print("median:")
print(repr(median))
print("std:")
print(repr(std))
deviation = data - median[:, np.newaxis]
comparison = deviation < 0.5 * std[:, np.newaxis]
print(comparison.shape, comparison.mask.shape)
print(comparison)
Error message:
Output under 1.23:
1.23.5
median:
masked_array(data=[nan, nan],
mask=False,
fill_value=1e+20)
std:
masked_array(data=[--, --],
mask=[ True, True],
fill_value=1e+20,
dtype=float64)
(2, 101) (2, 101)
[[-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- --]
[-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- --]]
Output under 1.25 (also 1.24):
1.25.2
median:
masked_array(data=[nan, nan],
mask=False,
fill_value=1e+20)
std:
masked_array(data=[--, --],
mask=[ True, True],
fill_value=1e+20,
dtype=float64)
(2, 101) (2, 1)
Traceback (most recent call last):
File "/home/mnoethe/test_numpy_ma_std.py", line 23, in <module>
print(comparison)
File "/home/mnoethe/.local/conda/envs/numpy-1.25/lib/python3.10/site-packages/numpy/ma/core.py", line 3997, in __str__
return str(self._insert_masked_print())
File "/home/mnoethe/.local/conda/envs/numpy-1.25/lib/python3.10/site-packages/numpy/ma/core.py", line 3991, in _insert_masked_print
_recursive_printoption(res, mask, masked_print_option)
File "/home/mnoethe/.local/conda/envs/numpy-1.25/lib/python3.10/site-packages/numpy/ma/core.py", line 2437, in _recursive_printoption
np.copyto(result, printopt, where=mask)
ValueError: could not broadcast where mask from shape (2,2) into shape (2,100)
Runtime information:
[{'numpy_version': '1.25.2', 'python': '3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) ' '[GCC 12.3.0]', 'uname': uname_result(system='Linux', node='e5b-dell-12', release='5.14.0-1051-oem', version='#58-Ubuntu SMP Fri Aug 26 05:50:00 UTC 2022', machine='x86_64')}, {'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'], 'found': ['SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2'], 'not_found': ['AVX512F', 'AVX512CD', 'AVX512_SKX', 'AVX512_CLX', 'AVX512_CNL', 'AVX512_ICL', 'AVX512_SPR']}}, {'architecture': 'Haswell', 'filepath': '/home/mnoethe/.local/conda/envs/numpy-1.25/lib/libopenblasp-r0.3.23.so', 'internal_api': 'openblas', 'num_threads': 20, 'prefix': 'libopenblas', 'threading_layer': 'pthreads', 'user_api': 'blas', 'version': '0.3.23'}]
Context for the issue:
Most confusingly, the example above works fine with numpy 1.25 if the shape of the data array is (2, 100) (just one element smaller in the last dimension).
I'm not entirely sure if this solves your problem but I think this is resolved in the most recent 2.0.0.dev0+git20230830.b73a5ae version.
Why do std and median have different masks?
Why is the median Nan unmasked but std masked?
I noticed this bug and I’d like to take a closer look and see if I can provide a solution. It might because some issue in sqrt. I’ll work on a potential fix and submit a PR if I make progress.