xarray icon indicating copy to clipboard operation
xarray copied to clipboard

`broadcast_like()` doesn't copy chunking structure

Open slevang opened this issue 1 year ago • 2 comments

What is your issue?

import dask.array
import xarray as xr

da1 = xr.DataArray(dask.array.ones((3,3), chunks=(1, 1)), dims=["x", "y"])
da2 = xr.DataArray(dask.array.ones((3,), chunks=(1,)), dims=["x"])

da2.broadcast_like(da1).chunksizes

Frozen({'x': (1, 1, 1), 'y': (3,)})

Was surprised to not find any other issues around this. Feels like a major limitation of the method for a lot of use cases. Is there an easy hack around this?

slevang avatar Feb 09 '24 02:02 slevang

The fundamental issue here is that we use np.broadcast_to(da2.data, da1.shape) (in Variable.set_dims) instead of np.broadcast_arrays(da1.data, da2.data). In the former chunking info from da1 isn't propagated.

This would be a decent lift, involving some refactoring of the core broadcasting logic.

dcherian avatar Feb 22 '24 04:02 dcherian

Following up, a workaround that seems effective is this:

da2 * xr.ones_like(da1)

ones_like dispatches to dask.array.full, so we get appropriate chunk structure and a trivial root task (instead of a giant numpy array) if we want to use this on a distributed client.

slevang avatar Mar 26 '24 18:03 slevang