rioxarray
rioxarray copied to clipboard
Add a rio.fill.fillnodata-based nodata interpolation in addition to existing scipy-based
The issue is interpolate_na
can be much slower than rio.fill.fillnodata
Simple example:
Reading B9 Landsat band
with rioxarray.open_rasterio(r"path/to/file/B9.tif", chunks = True, lock = True) as tif:
b9 = tif.load().chunk('auto')
nodata = b9.rio.nodata
Then trying to fill nodata using pure rasterio
%%time
mask = np.where(b9.data == nodata, 0, 1)
b9 = b9.compute()
b9.data = rio.fill.fillnodata(b9.data, mask, max_search_distance=250)
b9 = b9.chunk('auto')
CPU times: total: 22.9 s Wall time: 1min 33s
1 min is a pretty reasonable time
But if we try to fill nodata using rioxarray interpolate_na
%%time
b9 = b9.persist()
b9 = b9.rio.interpolate_na()
CPU times: total: 11min 2s Wall time: 27min 25s
Computation is extremely slow (maybe I am doing something wrong?)
The first idea how rio.fill.fillnodata
can work with xarray / dask arrays is using xarray.apply_ufunc
.
%%time
mask = xarray.where(b9.data == nodata, 0, 1)
b9 = xarray.apply_ufunc(rio.fill.fillnodata, b9, mask, dask = 'allowed', keep_attrs = 'override', kwargs = {'max_search_distance': 250})
CPU times: total: 29.3 s Wall time: 1min 40s
%%time
mask = xarray.where(b9.data == nodata, 0, 1)
b9 = xarray.apply_ufunc(rio.fill.fillnodata, b9, mask, dask = 'parallelized', keep_attrs = 'override', kwargs = {'max_search_distance': 250})
b9 = b9.persist()
CPU times: total: 38 s Wall time: 1min 25s
But maybe there are better solutions.
I had thought about using that. Not opposed to using that as an engine for interpolation.