cubed Add map_overlap as a new core op

It would be nice to add map_overlap alongside map_blocks, blockwise, rechunk, and apply_gufunc.

It's currently not directly used within xarray (even within xarray.map_blocks, which builds a HLG), but maybe it could / should be used there (cc @dcherian)

Regardless I think it should be on the wishlist as it is used in some other packages. For example xgcm.apply_as_grid_ufunc uses a pattern where dask.array.map_overlap is called from within the function supplied to xarray.apply_ufunc(), (it wraps the actual numpy function, so the kwarg is dask='allowed'). This is a trick that allows parallelism along all dimensions (both core dims and broadcast dims) for a large class of array algorithms of interest (e.g. differential functions).

dask.map_overlap is mostly implemented using map_blocks.

Jun 02 '23 13:06 TomNicholas

I think map_overlap could be implemented using Cubed's map_direct, which allows you to read arbitrary parts of Zarr arrays (it's used for indexing and concatenation already for example).

Sep 12 '23 10:09 tomwhite

I've started an implementation of map_overlap here: https://github.com/tomwhite/cubed/commit/cd1a15c2acadac22ef8854d7dfe5ac9e50d8cdf0, and it seems to be fairly straightforward. It only supports constant boundary values, but it should be possible to implement some of the other cases too fairly easily.

The nice thing about using map_overlap is that the Cubed implementation is very efficient - essentially a single blockwise with no intermediate Zarr arrays. So for problems like https://github.com/pangeo-data/distributed-array-examples/issues/1, which could use map_overlap to implement a derivative, this is very attractive. This is probably a better approach than using a combination of xp.diff and xp.pad (see #193) since Cubed would use several intermediate Zarr arrays, which would be very difficult to optimize.

Sep 26 '23 15:09 tomwhite

a combination of xp.diff and xp.pad since Cubed would use several intermediate Zarr arrays

This statement is confusing to me. Wouldn't diff just use map_overlap?

Sep 26 '23 16:09 dcherian

a combination of xp.diff and xp.pad since Cubed would use several intermediate Zarr arrays

This statement is confusing to me. Wouldn't diff just use map_overlap?

That's certainly one way of implementing it. I assumed that Xarray did this, but it looks like it uses indexing instead; see https://github.com/tomwhite/cubed/issues/193#issuecomment-1729348622.

But my point was that it would be harder to have Cubed optimize a combination of diff and pad atomic operations, compared to the more efficient implementation of map_overlap.

Sep 26 '23 16:09 tomwhite

cubed cubed copied to clipboard

Add map_overlap as a new core op

cubed
cubed copied to clipboard