cubed
cubed copied to clipboard
np.nanmean executes eagerly on cubed arrays but lazily on dask arrays
See image for demonstration.

np.nanmean is called by xarray's .mean() method when skipna=True, which is the default.
Dask supports the __array_ufunc__ and __array_function__ protocols which is why it works lazily in this case. Cubed doesn't, and nanmean is not in the Array API, so np.nanmean will coerce the Cubed array to a NumPy array eagerly.
There's a bit of a discussion about the nanfunctions in https://github.com/data-apis/array-api/issues/170, including workarounds, but I'm not sure if that would help you @TomNicholas?
nanmean is not in the Array API
Yeah I noticed that - what cubed does here is totally reasonable. I wasn't sure whether to raise this on cubed or xarray.
There's a bit of a discussion about the nanfunctions in https://github.com/data-apis/array-api/issues/170,
Thanks! This comment mentions using the where kwarg to mean, which I presume is something we could change xarray to do instead of calling nanmean (though I expect nanmean is probably more performant?). The where kwarg also isn't in the Array API standard either though by the looks of it. They say there is another discussion about nan-aware functions but I can't find it by searching that repo :confused:
see https://github.com/pydata/xarray/pull/7424
I had a look at how scikit-learn has implemented nan functions, e.g. nanmin:
https://github.com/scikit-learn/scikit-learn/blob/83a8015702213b39510a0f4898bc6879bcdf3ac2/sklearn/utils/_array_api.py#L489-L503
An approach like this would work for Xarray and Cubed, but it wouldn't be very performant as there are multiple rounds of reductions, which Cubed can't optimise well yet. (Note that #284 doesn't really cover the case of optimising one reduction after another).
So I think it would be simpler to add the nan functions to Cubed. I would suggest adding them to cubed.array, rather than cubed.array_api since they haven't been standardised yet. In fact, it might be worth deprecating cubed.array_api in favour of cubed.array, since it is fine for an array implementation to be a superset of the standard.