rioxarray Add a rio.fill.fillnodata-based nodata interpolation in addition to existing scipy-based

Add a rio.fill.fillnodata-based nodata interpolation in addition to existing scipy-based

Open simonreise opened this issue 1 year ago • 1 comments

The issue is interpolate_na can be much slower than rio.fill.fillnodata

Simple example:

Reading B9 Landsat band

with rioxarray.open_rasterio(r"path/to/file/B9.tif", chunks = True, lock = True) as tif:
    b9 = tif.load().chunk('auto')

nodata = b9.rio.nodata

Then trying to fill nodata using pure rasterio

%%time
mask = np.where(b9.data == nodata, 0, 1)
b9 = b9.compute()
b9.data = rio.fill.fillnodata(b9.data, mask, max_search_distance=250)
b9 = b9.chunk('auto')

CPU times: total: 22.9 s Wall time: 1min 33s

1 min is a pretty reasonable time

But if we try to fill nodata using rioxarray interpolate_na

%%time
b9 = b9.persist()
b9 = b9.rio.interpolate_na()

CPU times: total: 11min 2s Wall time: 27min 25s

Computation is extremely slow (maybe I am doing something wrong?)

The first idea how rio.fill.fillnodata can work with xarray / dask arrays is using xarray.apply_ufunc.

%%time
mask = xarray.where(b9.data == nodata, 0, 1)
b9 = xarray.apply_ufunc(rio.fill.fillnodata, b9, mask, dask = 'allowed', keep_attrs = 'override', kwargs = {'max_search_distance': 250})

CPU times: total: 29.3 s Wall time: 1min 40s

%%time
mask = xarray.where(b9.data == nodata, 0, 1)
b9 = xarray.apply_ufunc(rio.fill.fillnodata, b9, mask, dask = 'parallelized', keep_attrs = 'override', kwargs = {'max_search_distance': 250})
b9 = b9.persist()

CPU times: total: 38 s Wall time: 1min 25s

But maybe there are better solutions.

Jan 17 '24 17:01 simonreise

I had thought about using that. Not opposed to using that as an engine for interpolation.

Jan 18 '24 20:01 snowman2

rioxarray rioxarray copied to clipboard

Add a rio.fill.fillnodata-based nodata interpolation in addition to existing scipy-based

rioxarray
rioxarray copied to clipboard