xarray
xarray copied to clipboard
Which scipy.stats functions can be used with xarray.reduce()?
What is your issue?
I would like to use scipy.stats.skew and scipy.stats.kurt with multidimensional xarrays, and also with results of the "coarsen" function.
I don't see information about which functions can be used in the documentation, and I find that the errors that I get are difficult to understand (for me).
I'm not sure it is a bug, though, so I report it as an issue. Here is a test code:
import xarray as xr
import scipy.stats
import numpy as np
# define random dataarrays, 1D and 2D --
da = xr.DataArray( np.random.rand(100),dims=['x'])
da2 = xr.DataArray( np.random.rand(10,10),dims=['x','y'])
# Work on the whole one dimensional array, OK --
print('skewness 1D:',da.reduce(func=scipy.stats.skew).values)
#
# with 2D array it does not work --
print('skewness 2D:',da2.reduce(func=scipy.stats.skew).values)
# This gives error: ValueError: dimensions () must have the same length as the number of data dimensions, ndim=1 --
#
# try to compute the skewness in bins of 10 using coarsen: does not work --
daskew=da.coarsen(x=10).reduce(func=scipy.stats.skew)
# gives error: TypeError: tuple indices must be integers or slices, not tuple --
#
daskew=da.coarsen(x=10).reduce(func=scipy.stats.skew,axis=None)
# gives error: TypeError: skew() got multiple values for keyword argument 'axis'
The general requirement on reduce functions is that they behave like numpy
ufuncs, e.g. np.mean
(xarray
does not support the other ways to call ufuncs, e.g. np.mul.reduceat
). As far as I know, that means that they have to take at least the data and a axis
argument that can either be a int, a tuple of int, or None
(the default). We could definitely improve the documentation on this.
The scipy.stats
functions don't seem to be true ufuncs: unlike numpy
's functions, they will by default not reduce along all axes (axis=None
) but along the first axis (axis=0
).
To be able to use them with the standard reduce, you'd have to write a wrapper that changes that default:
def skew(data, *, axis=None, **kwargs):
return scipy.stats.skew(data, axis=axis, **kwargs)
da2.reduce(skew) # works
Edit: to make this a bit easier, you can also use a decorator:
def change_axis_default(func):
def wrapper(*args, axis=None, **kwargs):
return func(*args, axis=axis, **kwargs)
# can't use functools.wraps because that will pretend to have the old signature
wrapper.__doc__ = func.__doc__
wrapper.__name__ = func.__name__
wrapper.__qualname__ = func.__qualname__
return wrapper
da2.reduce(change_axis_default(scipy.stats.skew))
Edit: with scipy>=1.9.0
the scipy.stats
functions support tuple axis, so the second hack will not be needed anymore.
hack for scipy
def stack_axes(arr, axis):
if isinstance(axis, int):
axis = (axis,)
remainder = tuple(a for a in range(arr.ndim) if a not in axis)
transposed = np.transpose(arr, remainder + axis)
new_shape = transposed.shape[: len(remainder)] + (-1,)
return np.reshape(transposed, new_shape)
def skew(data, *, axis=None, **kwargs):
if isinstance(axis, tuple):
data = stack_axes(data, axis=axis)
axis = -1
return scipy.stats.skew(data, axis=axis, **kwargs)
da.coarsen(x=10).reduce(func=skew) # works
da.reduce(func=skew) # still works
Finally, the last call fails because coarsen(...).reduce
already passes a axis
kwarg so passing one again will make python complain.
Thanks, this is very helpful!