python-rasterstats
python-rasterstats copied to clipboard
Multiband support
zonalstats
was originally optimized for large vector feature collections against a single raster band (i.e. thousands of polygons against a DEM raster)
There is another use case which zonalstats
doesn't handle so well: many rasters/bands against a single geometry (i.e. the value at a single point for thousands of climate model simulations or a multispectral image). You can loop through and call zonal stats on each raster/band but this incurs the overhead of the rasterio.open each time. A more efficient mechanism might be to stack all the data into a multi-band raster and open it once, slice the 3D array and don't apply the aggregate functions to the 3rd dim - return an array rather than a scalar statistic.
I'll have to do some research to see how much refactoring/code clutter this would cause. Otherwise, it might just be a new zonalstats_multiband
function.
Work started in multiband
branch
Still need:
- [ ] bands for arrays (right now we assume 2D)
- [ ] check for 2 or 3D (4D not supported)
- [ ] deprecate, document the
band_num
toband
change - [ ] categorical by band?
- [ ] nodata by band?
- [ ] additional stats - apply by band or let the function implement the handling or bands?
- [ ] when all bands specified, read raster meta instead of assuming
[1, 2, 3]
array.mean(axis=(-1, -2)) # Works; the mean of the last two axes
masked.mean(axis=(-1, -2)) # TypeError: tuple indices must be integers, not tuple
There is an open PR (https://github.com/numpy/numpy/pull/5706) that serves as a proof of concept fix. This would need to be applied to all the functions used by the stats:
min, max, mean, count, std, sum, median, unqiue, percentile
Maybe looping through the bands is the only viable approach for now - even if we could make those changes and get them merged, it will be a while before they make it into mainstream releases.
+1 for reopening this one.
For my use case it would be crucial to have some multiband feature. For instance, I would like to perform stand based statistics on vector values, e.g. pixel brightness (combination of RGB values), pixel hue, etc. These cannot be independently calculated by bands and combined later, the transformation must be before aggregation.
(Returning vector values would be also nice, but this could be worked around, calculating vector, and return selected vector component.)