xarray icon indicating copy to clipboard operation
xarray copied to clipboard

Which scipy.stats functions can be used with xarray.reduce()?

Open atreguier opened this issue 2 years ago • 2 comments

What is your issue?

I would like to use scipy.stats.skew and scipy.stats.kurt with multidimensional xarrays, and also with results of the "coarsen" function.

I don't see information about which functions can be used in the documentation, and I find that the errors that I get are difficult to understand (for me).

I'm not sure it is a bug, though, so I report it as an issue. Here is a test code:

import xarray as xr
import scipy.stats
import numpy as np
#  define random dataarrays, 1D and 2D -- 
da = xr.DataArray( np.random.rand(100),dims=['x'])
da2 = xr.DataArray( np.random.rand(10,10),dims=['x','y'])
#  Work on the whole one dimensional array, OK -- 
print('skewness 1D:',da.reduce(func=scipy.stats.skew).values)
#
#  with 2D array it does not work -- 
print('skewness 2D:',da2.reduce(func=scipy.stats.skew).values)
# This gives error: ValueError: dimensions () must have the same length as the number of data dimensions, ndim=1 --
#
#  try to compute the skewness in bins of 10 using coarsen: does not work --
daskew=da.coarsen(x=10).reduce(func=scipy.stats.skew)
#  gives error: TypeError: tuple indices must be integers or slices, not tuple -- 
#
daskew=da.coarsen(x=10).reduce(func=scipy.stats.skew,axis=None)
#  gives error:  TypeError: skew() got multiple values for keyword argument 'axis'

atreguier avatar Aug 02 '22 07:08 atreguier

The general requirement on reduce functions is that they behave like numpy ufuncs, e.g. np.mean (xarray does not support the other ways to call ufuncs, e.g. np.mul.reduceat). As far as I know, that means that they have to take at least the data and a axis argument that can either be a int, a tuple of int, or None (the default). We could definitely improve the documentation on this.

The scipy.stats functions don't seem to be true ufuncs: unlike numpy's functions, they will by default not reduce along all axes (axis=None) but along the first axis (axis=0).

To be able to use them with the standard reduce, you'd have to write a wrapper that changes that default:

def skew(data, *, axis=None, **kwargs):
    return scipy.stats.skew(data, axis=axis, **kwargs)

da2.reduce(skew)  # works

Edit: to make this a bit easier, you can also use a decorator:

def change_axis_default(func):
    def wrapper(*args, axis=None, **kwargs):
        return func(*args, axis=axis, **kwargs)

    # can't use functools.wraps because that will pretend to have the old signature
    wrapper.__doc__ = func.__doc__
    wrapper.__name__ = func.__name__
    wrapper.__qualname__ = func.__qualname__
    return wrapper

da2.reduce(change_axis_default(scipy.stats.skew))

Edit: with scipy>=1.9.0 the scipy.stats functions support tuple axis, so the second hack will not be needed anymore.

hack for scipy
def stack_axes(arr, axis):
    if isinstance(axis, int):
        axis = (axis,)

    remainder = tuple(a for a in range(arr.ndim) if a not in axis)
    transposed = np.transpose(arr, remainder + axis)
    new_shape = transposed.shape[: len(remainder)] + (-1,)
    return np.reshape(transposed, new_shape)

def skew(data, *, axis=None, **kwargs):
    if isinstance(axis, tuple):
        data = stack_axes(data, axis=axis)
        axis = -1

    return scipy.stats.skew(data, axis=axis, **kwargs)

da.coarsen(x=10).reduce(func=skew)  # works
da.reduce(func=skew)  # still works

Finally, the last call fails because coarsen(...).reduce already passes a axis kwarg so passing one again will make python complain.

keewis avatar Aug 02 '22 10:08 keewis

Thanks, this is very helpful!

atreguier avatar Aug 02 '22 10:08 atreguier