python-rasterstats nodata count includes non-overlapping cells

For each feature, there are three types of cells: data, nodata, and non-overlapping. Nodata and non-overlapping get lumped together in the nodata statistic.

Was this always the case? Was the bug introduced in an attempt to optimize?

[ ] write a failing test case for the partial overlap case
[ ] fix it

Dec 18 '15 10:12 perrygeo

Also, let's consider nans.

Here's an example, crudely drawn. The raster is 3 pixels tall with a nodata value of 0, the geometry covers 6 pixels. photo on 9-3-16 at 10 19 am

So the breakdown of cell counts should be

1 valid data cell (count)
1 nodata cell (nodata)
1 nan cell
3 non-overlapping cells

The problem currently is that a) we don't track nan cell counts at all and b) the non-overlapping ones get lumped into nodata

Still debating if it's better to do

{'count': 1, 'nodata': 5} 
# or
{'count': 1, 'nodata': 1, 'nan': 1, 'no_overlap': 3}

IOW is there really a need to distinguish between the three types of invalid data cells?

Sep 03 '16 14:09 perrygeo

i would track each type and then let the user decide how they are returned. group them by default with an option to return them separately

Sep 03 '16 20:09 sgoodm

Implementation is a little trickier than expected, deferring this to the 0.12 milestone

Oct 01 '16 11:10 perrygeo

I do not think it's a problem and we need not count non-overlapping pixels. From my understanding, the non-overlapping is not a part of the raster. From above picture, the raster is inside the red box and black is nodata. The blue box is the geometry which covers 8 pixels. So: {'count': 2, 'nodata': 6} The white pixels do not belong to the raster. In your example, I prefer this result: {'count': 1, 'nodata': 1, 'nan': 1}

We can not get the result like this: {'count': 1, 'nodata': 5}

Mar 20 '17 09:03 XiaopingDu

looks like the non overlapping areas get lumped into nodata due to the the boundless options (boundless_array func in rasterstats for arrays, and equivalent steps in rasterio for rasters) which either initialize the boundless output with the nodata value (boundless_array) or fill it with the nodata value before it gets returned (rasterio)

this seems to be when it was introduced for arrays, rasters were probably earlier https://github.com/perrygeo/python-rasterstats/commit/e50d89b1f8d1769e7e5b026cf8d5f4e7662cd929

Mar 22 '17 20:03 sgoodm

this might work: if the user wants no_overlap count returned, they need to provide a second, unused "fake" nodata value. this value is passed to rasterio as the raster's nodata value which results in the true nodata value being treated as just another value. we then have distinct values for no_overlap vs nodata and just need to update our mask accordingly

in the case when the user does not request the no_overlap field, it just behaves exactly as it currently does

should be able to tweak the boundless_array func to do the same thing for np arrays

Mar 22 '17 21:03 sgoodm

PR #146 adds 'nan' stat option

Mar 24 '17 19:03 sgoodm

rasterio issue on shapes extending beyond raster extent https://github.com/mapbox/rasterio/issues/995

Mar 24 '17 20:03 sgoodm

@perrygeo almost done with a no overlap stat proof of concept that works with array rasters. but i think there may be some limitations to getting it working with rasters opened via rasterio

Mar 27 '17 18:03 sgoodm

Mine shapefile polygon is not snapped with raster when i run zonal_stats results are not exactly matching with Arc GIS result. I am using Python 3.7 and pygeoprocessing 1.5. Is there any way by which we can match Arc GIS result?

Dec 21 '18 05:12 princemathur

python-rasterstats python-rasterstats copied to clipboard

nodata count includes non-overlapping cells

python-rasterstats
python-rasterstats copied to clipboard