python-rasterstats New options: Percent coverage selection and weighting

Adds code to generate percent coverage estimates for raster cells by rasterizing vector features at a finer resolution than raster data, then aggregating back to raster data resolution. An adjustable scale can be used (percent_cover_scale int arg) to adjust how accurate the estimates are. A scale of 10 = 1 order of magnitude finer resolution and is generally good enough to get estimates <10% from actual. A scale of 100 will usually get you well under 1% but is slower and requires quite a bit more memory*.

Users can use these coverage estimates to either discard cells that do not meet a minimum coverage (percent_cover_selection float arg) or use the weights to generate (potentially) more accurate measures of mean, count, and sum stats (percent_cover_weighting bool arg)

*I have some misc memory improvements that I can put into another pull request

(I can take care of merging this PR with the latitude correction PR i submitted if you want to use both of them)

Nov 21 '16 19:11 sgoodm

Coverage decreased (-6.4%) to 91.328% when pulling ba627d793de2a57f81b759302bfcc9ada50f0e63 on sgoodm:percent_cover into 6bdb524aa1471a719ba3ee6491786d2ac5c8a7ab on perrygeo:master.

Nov 21 '16 19:11 coveralls

Coverage decreased (-5.9%) to 91.837% when pulling 6725da1834cee629a617c7a50c0faae935a25c24 on sgoodm:percent_cover into 6bdb524aa1471a719ba3ee6491786d2ac5c8a7ab on perrygeo:master.

Jan 11 '17 16:01 coveralls

Coverage decreased (-6.2%) to 91.575% when pulling a03eb04c1ddb1d69050f14de6073d59569747a01 on sgoodm:percent_cover into 6bdb524aa1471a719ba3ee6491786d2ac5c8a7ab on perrygeo:master.

Mar 21 '17 15:03 coveralls

Coverage increased (+0.04%) to 97.802% when pulling 7000632805cad9083867712064bcbe1814ac1d17 on sgoodm:percent_cover into 6bdb524aa1471a719ba3ee6491786d2ac5c8a7ab on perrygeo:master.

Mar 21 '17 18:03 coveralls

Coverage increased (+0.04%) to 97.806% when pulling 85f62a1fc26b3a2d42e7f4f89ee6e272db8dc23c on sgoodm:percent_cover into 6bdb524aa1471a719ba3ee6491786d2ac5c8a7ab on perrygeo:master.

Mar 24 '17 16:03 coveralls

Will need to consider how using percent cover as a selection method impacts nodata stat. May need to be a separate field?

Mar 24 '17 17:03 sgoodm

Coverage increased (+0.02%) to 97.786% when pulling 644ddc3905bbffcd5f226315bc37443ecca36334 on sgoodm:percent_cover into 6bdb524aa1471a719ba3ee6491786d2ac5c8a7ab on perrygeo:master.

Mar 24 '17 17:03 coveralls

Coverage decreased (-0.1%) to 99.084% when pulling 1260a4cd671e2ee99ffc2606af2d3b3d5a47644b on sgoodm:percent_cover into a0eef4ee721ff68503c25e073561a98a71768a9d on perrygeo:master.

Mar 30 '17 19:03 coveralls

Coverage increased (+0.5%) to 99.112% when pulling a52055f74458b9f88ccfd5e8e136b3a36385503b on sgoodm:percent_cover into b5727bfa218511cd921978f79c40b97ee06e8d72 on perrygeo:master.

Aug 02 '17 19:08 coveralls

@sgoodm - what is the status of this PR?

Oct 24 '17 18:10 jhamman

@jhamman I have a fork that includes updated code for this PR and the other PRs (#135, #154) I have submitted, but I have not been maintaining the individual branches since the code changes a fair bit when they are all merged. I am happy to update the individual branch if @perrygeo decides to incorporate it.

Oct 24 '17 18:10 sgoodm

Thanks for the update @sgoodm. Ping @perrygeo.

Oct 27 '17 04:10 jhamman

Was there any update on this? I would be interested to see these features included in the project.

Feb 01 '18 09:02 benlaken

@sgoodm @benlaken 👋

I haven't forgotten about this one! This project is fully a volunteer effort for me - struggling to find the time to give this a proper review.

I love the intent of this PR (pixel weighting has been on my mind since #90) but I have some concerns with this implementation. Detailed review coming...

Feb 03 '18 15:02 perrygeo

Sorry if this is off-topic/spam, but I've been working on a similar project that I thought might be of interest. It's some C++ code, exposed as an R package, that computes percent coverage exactly (it's a vector-based calculation), but also extremely quickly (it's faster than most cell-center-based implementations.) The approach is described here. It seems like it should be possible to wrap it up so that it can be called on numpy arrays.

Jun 20 '18 17:06 dbaston

Hoping to have a chance to get back to this in the next month or two

Aug 10 '18 16:08 sgoodm

Finally had some time to come back to this

stripped out the "latitude correction" stuff I had accidentally merged into PR branch
kept the like arg in the new and existing rasterization functions
minimized main function complexity by moving stat calculations related to percent cover to functions in utils

@perrygeo - let me know if you still have concerns regarding the percent_cover_scale or rebin_sum function (or anything else)

Also, I would definitely be interested in checking out a wrapper of what @dbaston has put together

Aug 31 '18 17:08 sgoodm

triggering CI rebuild

Sep 04 '18 16:09 sgoodm

Hi all - just checking on the status here. This seems to have stalled more than once and I'd like to see if we can help move it forward, if at all possible.

Apr 24 '19 02:04 jhamman

I hate to do the annoying whats-the-status-of-this-PR, but here I am :) Is there some known big caveat before considering attempting to use this change?

Sep 23 '19 18:09 akrherz

@akrherz I haven't tested it extensively but I believe, from a correctness-of-output perspective, you can rely on these changes. You could help by validating the results and posting here.

The remaining work concerns mitigating performance impacts, documentation, API changes, etc before creating a release out. Nothing too significant but I simply don't have the time to do the work. If you can use this branch instead of an official release, please do and let us know how it works for you.

Sep 23 '19 21:09 perrygeo

@sgoodm, I'm geo-spatial noob, but before I saw your approach, to solve weighted extract, I saw @dbaston exact approach, that is different to yours that produce estimate weight. I took an idea from the second answer to this question How does QGIS Zonal Statistics handle partially overlapping pixels? which is a simple approach of vecotrizing each pixel and intersect with the overlay polygon, dividing areas will result for the fraction coverage of that pixel dividing that with sum of fractions will result with the weight of that pixel/poly part. I was wondering if this approach has flaws?

Dec 21 '19 12:12 davidabek1

@davidabek1 - vectorizing for exact intersections will not have any flaws in terms of accuracy, but will be more costly in terms of computations. For a few small geometries, like in that Stack question, the additional computation probably won't matter to you, but if you wanted to get exact coverage for many large geometries then the cost of those vector based intersections over many pixels adds up.

Dec 23 '19 01:12 sgoodm

@davidabek1 @sgoodm The approach of https://github.com/isciences/exactextract is equivalent to the Stack Exchange link; the difference is that exactextract does the vector-based intersections in a way that takes advantage of the fact that the raster isn't an arbitrary polygon, it's a grid. Done this way there is no performance penalty for the exact calculation. In many cases it is faster because of all of the avoided point-in-polygon tests.

Jan 22 '20 20:01 dbaston

python-rasterstats python-rasterstats copied to clipboard

New options: Percent coverage selection and weighting

python-rasterstats
python-rasterstats copied to clipboard