matplotlib icon indicating copy to clipboard operation
matplotlib copied to clipboard

Use old stride_windows implementation on 32-bit builds

Open QuLogic opened this issue 1 year ago • 5 comments

PR summary

I've long had the patch on Fedora (since https://github.com/matplotlib/matplotlib/pull/21190#issuecomment-1223271888), but it's now applicable to WASM as well (#29093), which is 32-bit. The older implementation doesn't OOM.

cc @anntzer as original author of that PR in case you have an alternate implementation idea.

PR checklist

QuLogic avatar Nov 09 '24 08:11 QuLogic

It's not really clear to me why sliding_window_view (in the way we use it) would lead to an OOM while the manual approach wouldn't?

anntzer avatar Nov 09 '24 11:11 anntzer

Perhaps there is a NumPy calculation bug? It ends up as:

__________ test_psd_csd[png] __________

    @image_comparison(
        ["psd_freqs.png", "csd_freqs.png", "psd_noise.png", "csd_noise.png"],
        remove_text=True, tol=0.002)
    def test_psd_csd():
        n = 10000
        Fs = 100.
    
        fstims = [[Fs/4, Fs/5, Fs/11], [Fs/4.7, Fs/5.6, Fs/11.9]]
        NFFT_freqs = int(1000 * Fs / np.min(fstims))
        x = np.arange(0, n, 1/Fs)
        ys_freqs = np.sin(2 * np.pi * np.multiply.outer(fstims, x)).sum(axis=1)
    
        NFFT_noise = int(1000 * Fs / 11)
        np.random.seed(0)
        ys_noise = [np.random.standard_normal(n), np.random.rand(n)]
    
        all_kwargs = [{"sides": "default"},
                      {"sides": "onesided", "return_line": False},
                      {"sides": "twosided", "return_line": True}]
        for ys, NFFT in [(ys_freqs, NFFT_freqs), (ys_noise, NFFT_noise)]:
            noverlap = NFFT // 2
            pad_to = int(2 ** np.ceil(np.log2(NFFT)))
            for ax, kwargs in zip(plt.figure().subplots(3), all_kwargs):
>               ret = ax.psd(np.concatenate(ys), NFFT=NFFT, Fs=Fs,
                             noverlap=noverlap, pad_to=pad_to, **kwargs)

../venv-test/lib/python3.12/site-packages/matplotlib/tests/test_axes.py:5529: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../venv-test/lib/python3.12/site-packages/matplotlib/_api/deprecation.py:453: in wrapper
    return func(*args, **kwargs)
../venv-test/lib/python3.12/site-packages/matplotlib/__init__.py:1521: in inner
    return func(
../venv-test/lib/python3.12/site-packages/matplotlib/axes/_axes.py:7616: in psd
    pxx, freqs = mlab.psd(x=x, NFFT=NFFT, Fs=Fs, detrend=detrend,
../venv-test/lib/python3.12/site-packages/matplotlib/mlab.py:511: in psd
    Pxx, freqs = csd(x=x, y=None, NFFT=NFFT, Fs=Fs, detrend=detrend,
../venv-test/lib/python3.12/site-packages/matplotlib/mlab.py:567: in csd
    Pxy, freqs, _ = _spectral_helper(x=x, y=y, NFFT=NFFT, Fs=Fs,
../venv-test/lib/python3.12/site-packages/matplotlib/mlab.py:307: in _spectral_helper
    result = np.lib.stride_tricks.sliding_window_view(
../venv-test/lib/python3.12/site-packages/numpy/lib/stride_tricks.py:336: in sliding_window_view
    return as_strided(x, strides=out_strides, shape=out_shape,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

x = array([ 0.        ,  2.49169733,  1.49741725, ...,  1.04870198,
        0.57119919, -0.18319108]), shape = (1988101, 11900), strides = (8, 8), subok = False, writeable = False

    def as_strided(x, shape=None, strides=None, subok=False, writeable=True):
        """
        Create a view into the array with the given shape and strides.
    
        .. warning:: This function has to be used with extreme care, see notes.
    
        Parameters
        ----------
        x : ndarray
            Array to create a new.
        shape : sequence of int, optional
            The shape of the new array. Defaults to ``x.shape``.
        strides : sequence of int, optional
            The strides of the new array. Defaults to ``x.strides``.
        subok : bool, optional
            .. versionadded:: 1.10
    
            If True, subclasses are preserved.
        writeable : bool, optional
            .. versionadded:: 1.12
    
            If set to False, the returned array will always be readonly.
            Otherwise it will be writable if the original array was. It
            is advisable to set this to False if possible (see Notes).
    
        Returns
        -------
        view : ndarray
    
        See also
        --------
        broadcast_to : broadcast an array to a given shape.
        reshape : reshape an array.
        lib.stride_tricks.sliding_window_view :
            userfriendly and safe function for the creation of sliding window views.
    
        Notes
        -----
        ``as_strided`` creates a view into the array given the exact strides
        and shape. This means it manipulates the internal data structure of
        ndarray and, if done incorrectly, the array elements can point to
        invalid memory and can corrupt results or crash your program.
        It is advisable to always use the original ``x.strides`` when
        calculating new strides to avoid reliance on a contiguous memory
        layout.
    
        Furthermore, arrays created with this function often contain self
        overlapping memory, so that two elements are identical.
        Vectorized write operations on such arrays will typically be
        unpredictable. They may even give different results for small, large,
        or transposed arrays.
    
        Since writing to these arrays has to be tested and done with great
        care, you may want to use ``writeable=False`` to avoid accidental write
        operations.
    
        For these reasons it is advisable to avoid ``as_strided`` when
        possible.
        """
        # first convert input to array, possibly keeping subclass
        x = np.array(x, copy=False, subok=subok)
        interface = dict(x.__array_interface__)
        if shape is not None:
            interface['shape'] = tuple(shape)
        if strides is not None:
            interface['strides'] = tuple(strides)
    
>       array = np.asarray(DummyArray(interface, base=x))
E       ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.

../venv-test/lib/python3.12/site-packages/numpy/lib/stride_tricks.py:105: ValueError

QuLogic avatar Nov 09 '24 12:11 QuLogic

Oh, I see, this is because the intermediate array is too large even though we slice it immediately (to compute the overlapping FTs); also it seems like numpy wants array.size * array.itemsize to be representable even though that may be much bigger than the physical array size. That seems overall related to the request for step_size at https://github.com/numpy/numpy/issues/18244.

I guess the easy way out is indeed to go back to as_strided.

anntzer avatar Nov 09 '24 13:11 anntzer

I thought mlab was being deprecated at some point. How useful is this to add this code piece, versus adding a pytest.skipif(sys.maxsize < 2**32) on the failing tests and suggesting users to do this themselves if they want to do large array calculations on 32-bit systems?

greglucas avatar Nov 12 '24 18:11 greglucas

Fair enough; I don't know what the status of the deprecations are at this point. I will say that this is reverting to the pre-#21190 code, so it's not new, and I've been using the patch on Fedora without issue since that PR, so it's been stable AFAICT.

QuLogic avatar Nov 15 '24 10:11 QuLogic