xarray-spatial icon indicating copy to clipboard operation
xarray-spatial copied to clipboard

Xrspatial Proximity doesn't work with large rasters even with dask backed array

Open GrahamReveley opened this issue 2 years ago • 3 comments

Xrspatial not working with large dask backed arrays I believe there's a bug in the xrspatial.proximity method whereby the coordinates for a large array (even if its a dask array) are loaded into memory as a numpy array instead of there being a dask array. For context I'm attempting to compute a distance to coast for a 90m global dataset. I had a look into the source code and I believe the issue is here (in the _process function of proximity):

` raster_dims = raster.dims if raster_dims != (y, x): raise ValueError( "raster.coords should be named as coordinates:" "({0}, {1})".format(y, x) )

distance_metric = DISTANCE_METRICS.get(distance_metric, None)
if distance_metric is None:
    distance_metric = DISTANCE_METRICS["EUCLIDEAN"]

target_values = np.asarray(target_values)

# x-y coordinates of each pixel.
# flatten the coords of input raster and reshape to 2d
xs = np.tile(raster[x].data, raster.shape[0]).reshape(raster.shape)
ys = np.repeat(raster[y].data, raster.shape[1]).reshape(raster.shape)

`

Therefore XS and YS are huge numpy arrays that don't fit into memory, whereas, if the input data is a dask array these should probably be dask arrays rather than numpy arrays. Later on in the processing sequence the proximity calculation is either done using dask or numpy and I think there should be a similar thing here for dask/numpy processing.

If I've missed something please let me know, more than happy to share some more code if needs be and in the mean time I can use gdal_proximity.py directly but I guess it would be slower than using the dask backed xrspatial.

Thanks!

GrahamReveley avatar Mar 08 '23 15:03 GrahamReveley

Also ran into this 👍

bstadlbauer avatar Apr 24 '23 10:04 bstadlbauer

@GrahamReveley @bstadlbauer thanks contributing here and I believe this is a known issue to @thuydotm, but hasn't been formally documented. also sorry for delay in responding.

  • [ ] @thuydotm at a minimum we need to document the limitation in proximity.

We are going to prioritize this proximity over the next sprints, but happy to help review / test a PR if y'all want contribute a fix.

brendancol avatar Jun 02 '23 16:06 brendancol

@brendancol We do have a constrained usecase and are working with a changed fork of the proximity code here, but IIRC that's nothing that we can easily generalize to work for all usecases

bstadlbauer avatar Jun 07 '23 07:06 bstadlbauer