scipy icon indicating copy to clipboard operation
scipy copied to clipboard

BUG: `scipy.stats.ttest_1samp` raises an error when using `keepdims=True`

Open sdiebolt opened this issue 1 year ago • 5 comments

Describe your issue.

Setting keepdims=True when using scipy.stats.ttest_1samp leads to an error.

Reproducing Code Example

import numpy as np
from scipy import stats

rng = np.random.default_rng()

rvs = stats.uniform.rvs(size=(10, 10, 10), random_state=rng)

stats.ttest_1samp(rvs, popmean=0.5, axis=2, keepdims=True)

Error message

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sdiebolt/Documents/Personal/scipy-error/.venv/lib/python3.12/site-packages/scipy/stats/_axis_nan_policy.py", line 565, in axis_nan_policy_wrapper
    res = _add_reduced_axes(res, reduced_axes, keepdims)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sdiebolt/Documents/Personal/scipy-error/.venv/lib/python3.12/site-packages/scipy/stats/_axis_nan_policy.py", line 247, in _add_reduced_axes
    return ([np.expand_dims(output, reduced_axes) for output in res]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sdiebolt/Documents/Personal/scipy-error/.venv/lib/python3.12/site-packages/numpy/lib/shape_base.py", line 597, in expand_dims
    axis = normalize_axis_tuple(axis, out_ndim)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sdiebolt/Documents/Personal/scipy-error/.venv/lib/python3.12/site-packages/numpy/core/numeric.py", line 1380, in normalize_axis_tuple
    axis = tuple([normalize_axis_index(ax, ndim, argname) for ax in axis])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
numpy.exceptions.AxisError: axis 2 is out of bounds for array of dimension 1

SciPy/NumPy/Python version and system information

1.13.0 1.26.4 sys.version_info(major=3, minor=12, micro=2, releaselevel='final', serial=0)
/home/sdiebolt/Documents/Personal/scipy-error/.venv/lib/python3.12/site-packages/scipy/__config__.py:154: UserWarning: Install `pyyaml` for better output
  warnings.warn("Install `pyyaml` for better output", stacklevel=1)
{
  "Compilers": {
    "c": {
      "name": "gcc",
      "linker": "ld.bfd",
      "version": "10.2.1",
      "commands": "cc"
    },
    "cython": {
      "name": "cython",
      "linker": "cython",
      "version": "3.0.10",
      "commands": "cython"
    },
    "c++": {
      "name": "gcc",
      "linker": "ld.bfd",
      "version": "10.2.1",
      "commands": "c++"
    },
    "fortran": {
      "name": "gcc",
      "linker": "ld.bfd",
      "version": "10.2.1",
      "commands": "gfortran"
    },
    "pythran": {
      "version": "0.15.0",
      "include directory": "../../tmp/pip-build-env-giifv1os/overlay/lib/python3.12/site-packages/pythran"
    }
  },
  "Machine Information": {
    "host": {
      "cpu": "x86_64",
      "family": "x86_64",
      "endian": "little",
      "system": "linux"
    },
    "build": {
      "cpu": "x86_64",
      "family": "x86_64",
      "endian": "little",
      "system": "linux"
    },
    "cross-compiled": false
  },
  "Build Dependencies": {
    "blas": {
      "name": "openblas",
      "found": true,
      "version": "0.3.26.dev",
      "detection method": "pkgconfig",
      "include directory": "/usr/local/include",
      "lib directory": "/usr/local/lib",
      "openblas configuration": "USE_64BITINT=0 DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS= NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= ZEN MAX_THREADS=64",
      "pc file directory": "/usr/local/lib/pkgconfig"
    },
    "lapack": {
      "name": "openblas",
      "found": true,
      "version": "0.3.26.dev",
      "detection method": "pkgconfig",
      "include directory": "/usr/local/include",
      "lib directory": "/usr/local/lib",
      "openblas configuration": "USE_64BITINT=0 DYNAMIC_ARCH=1 DYNAMIC_OLDER= NO_CBLAS= NO_LAPACK= NO_LAPACKE= NO_AFFINITY=1 USE_OPENMP= ZEN MAX_THREADS=64",
      "pc file directory": "/usr/local/lib/pkgconfig"
    },
    "pybind11": {
      "name": "pybind11",
      "version": "2.12.0",
      "detection method": "config-tool",
      "include directory": "unknown"
    }
  },
  "Python Information": {
    "path": "/opt/python/cp312-cp312/bin/python",
    "version": "3.12"
  }
}

sdiebolt avatar May 16 '24 09:05 sdiebolt

I think I managed to find what the problem is, I'll try a PR if it's alright!

sdiebolt avatar May 16 '24 10:05 sdiebolt

I'll try a PR if it's alright!

sure, go for it!

lucascolley avatar May 16 '24 10:05 lucascolley

LMK if you're still working on this, otherwise I'll likely fix it tonight. There are some features that have been added since _add_reduced_axes was written that store non-array private attributes in the result object, and these are commplaining when we try to add dimensions to them. I don't think it's worth a very deep dive to fix, since the whole decorator will likely be rewritten with the array API work that is going on. The fastest fix would probably be to add a heuristic (which will become a new convention) to determine which attributes do not need dimensions added. For example, in this case, _alternative is a Python int rather than a NumPy scalar or array - maybe that can be the signal that dimensions should not be added to it. That way, we can restrict the changes to the _add_reduced_axes function itself. I don't think there are very many cases we need to consider, so as long as we document it and expand the tests, it should be fine. Currently, we're only testing keepdims for two functions - which is unusual because the rest of the features of the _axis_nan_policy decorator are tested for every function to which they're applied. I think this was done to save CI time. But we might as well add all the functions in there, at least as a slow test.

mdhaber avatar May 16 '24 18:05 mdhaber

Indeed, I thought about adding such a heuristic, e.g., expanding dims only from arrays and not from any other types. However I was unsure of the impact of such a change and was trying to understand the implications, since _add_reduced_axes seemed to be used by many other functions. I won't be able to push a PR until tomorrow anyway, so if you're more comfortable with the impacts and able to do it tonight, go for it! 😉

sdiebolt avatar May 16 '24 18:05 sdiebolt

No, go ahead and try that approach when you can, and I can review. I don't think you can only expand dims from arrays because you will want to expand NumPy scalar types, too. But see what happens if you just look out for Python ints e.g.

    return ([(np.expand_dims(output, reduced_axes) if not isinstance(output, int) else output) for output in res]
            if keepdims else res)

And add all the other functions in axis_nan_policy_cases to test_keepdims: https://github.com/scipy/scipy/blob/7dcd8c59933524986923cde8e9126f5fc2e6b30b/scipy/stats/tests/test_axis_nan_policy.py#L487

That should smoke out any other cases I'm not thinking of. But IIRC there should not be too many cases - it really might be just the few functions that return a result with this _alternative parameter.

mdhaber avatar May 16 '24 18:05 mdhaber

gh-20734 fixes the bug for ttest_1samp, but I'll leave this open to remind us to extend the test to other functions and remove the warning filters.

mdhaber avatar May 30 '24 21:05 mdhaber