cubed icon indicating copy to clipboard operation
cubed copied to clipboard

Useful functions not in the Array API Standard

Open TomNicholas opened this issue 2 years ago • 8 comments

There are a few numpy functions which xarray calls on wrapped arrays but which are not (yet) in the Array API Standard. (See https://github.com/data-apis/array-api/issues/187#issuecomment-1553615779) Cubed could choose to implement these to facilitate full integration.

FYI of this list xarray currently uses:

np.clip
np.diff
np.pad
np.repeat
np.take
np.tile

Of particular interest to me personally is np.pad. It's used within xarray's .pad method, which is used within xGCM's apply_as_grid_ufunc, which led to the pad function being an important part of the test case that exposed memory management problems with dask's distributed scheduler. I can't really close the loop by trying out cubed on that full original problem unless pad is available in cubed.

pad is also interesting because a parallel implementation isn't trivial - dask's pad implementation uses map_blocks in some cases, but more complicated tricks in other cases. For my purposes above I wouldn't need to implement more than one or two of the mode kwarg options though.

I would be interested in submitting a PR for adding pad if that's something you would welcome @tomwhite? (I mentioned this on the xarray tracker but it's really a cubed question https://github.com/pydata/xarray/issues/7848#issuecomment-1553614542)

cc @jbusecke

TomNicholas avatar May 31 '23 20:05 TomNicholas

I would be interested in submitting a PR for adding pad

That would be very welcome! Just implementing the cases you need seems like a good way forward. (We might want to put it in a different namespace as it's not part of the array API, but we can discuss that later.)

BTW take is already implemented here.

tomwhite avatar Jun 01 '23 08:06 tomwhite

That would be really cool @TomNicholas. Happy to test out prototypes whenever you think that is useful!

jbusecke avatar Jun 05 '23 19:06 jbusecke

Xarray doesn't delegate to np.diff so this already works with Cubed:

>>> import xarray as xr
>>> import numpy as np
>>> import cubed.random
>>> da = xr.DataArray(cubed.random.random((3, 4), chunks=(2, 2)), dims=["x", "y"])
>>> d = da.diff("y")
>>> from numpy.testing import assert_array_equal
>>> assert_array_equal(d.values, np.diff(da.values, axis=1))

tomwhite avatar Sep 21 '23 11:09 tomwhite

clip was added in #583

tomwhite avatar Oct 01 '24 10:10 tomwhite