python-rasterstats
python-rasterstats copied to clipboard
New options: Percent coverage selection and weighting
Adds code to generate percent coverage estimates for raster cells by rasterizing vector features at a finer resolution than raster data, then aggregating back to raster data resolution. An adjustable scale can be used (percent_cover_scale int arg) to adjust how accurate the estimates are. A scale of 10 = 1 order of magnitude finer resolution and is generally good enough to get estimates <10% from actual. A scale of 100 will usually get you well under 1% but is slower and requires quite a bit more memory*.
Users can use these coverage estimates to either discard cells that do not meet a minimum coverage (percent_cover_selection float arg) or use the weights to generate (potentially) more accurate measures of mean, count, and sum stats (percent_cover_weighting bool arg)
*I have some misc memory improvements that I can put into another pull request
(I can take care of merging this PR with the latitude correction PR i submitted if you want to use both of them)
Coverage decreased (-6.4%) to 91.328% when pulling ba627d793de2a57f81b759302bfcc9ada50f0e63 on sgoodm:percent_cover into 6bdb524aa1471a719ba3ee6491786d2ac5c8a7ab on perrygeo:master.
Coverage decreased (-5.9%) to 91.837% when pulling 6725da1834cee629a617c7a50c0faae935a25c24 on sgoodm:percent_cover into 6bdb524aa1471a719ba3ee6491786d2ac5c8a7ab on perrygeo:master.
Coverage decreased (-6.2%) to 91.575% when pulling a03eb04c1ddb1d69050f14de6073d59569747a01 on sgoodm:percent_cover into 6bdb524aa1471a719ba3ee6491786d2ac5c8a7ab on perrygeo:master.
Coverage increased (+0.04%) to 97.802% when pulling 7000632805cad9083867712064bcbe1814ac1d17 on sgoodm:percent_cover into 6bdb524aa1471a719ba3ee6491786d2ac5c8a7ab on perrygeo:master.
Coverage increased (+0.04%) to 97.806% when pulling 85f62a1fc26b3a2d42e7f4f89ee6e272db8dc23c on sgoodm:percent_cover into 6bdb524aa1471a719ba3ee6491786d2ac5c8a7ab on perrygeo:master.
Will need to consider how using percent cover as a selection method impacts nodata stat. May need to be a separate field?
Coverage increased (+0.02%) to 97.786% when pulling 644ddc3905bbffcd5f226315bc37443ecca36334 on sgoodm:percent_cover into 6bdb524aa1471a719ba3ee6491786d2ac5c8a7ab on perrygeo:master.
Coverage decreased (-0.1%) to 99.084% when pulling 1260a4cd671e2ee99ffc2606af2d3b3d5a47644b on sgoodm:percent_cover into a0eef4ee721ff68503c25e073561a98a71768a9d on perrygeo:master.
Coverage increased (+0.5%) to 99.112% when pulling a52055f74458b9f88ccfd5e8e136b3a36385503b on sgoodm:percent_cover into b5727bfa218511cd921978f79c40b97ee06e8d72 on perrygeo:master.
@sgoodm - what is the status of this PR?
@jhamman I have a fork that includes updated code for this PR and the other PRs (#135, #154) I have submitted, but I have not been maintaining the individual branches since the code changes a fair bit when they are all merged. I am happy to update the individual branch if @perrygeo decides to incorporate it.
Thanks for the update @sgoodm. Ping @perrygeo.
Was there any update on this? I would be interested to see these features included in the project.
@sgoodm @benlaken 👋
I haven't forgotten about this one! This project is fully a volunteer effort for me - struggling to find the time to give this a proper review.
I love the intent of this PR (pixel weighting has been on my mind since #90) but I have some concerns with this implementation. Detailed review coming...
Sorry if this is off-topic/spam, but I've been working on a similar project that I thought might be of interest. It's some C++ code, exposed as an R package, that computes percent coverage exactly (it's a vector-based calculation), but also extremely quickly (it's faster than most cell-center-based implementations.) The approach is described here. It seems like it should be possible to wrap it up so that it can be called on numpy arrays.
Hoping to have a chance to get back to this in the next month or two
Finally had some time to come back to this
- stripped out the "latitude correction" stuff I had accidentally merged into PR branch
- kept the
likearg in the new and existing rasterization functions - minimized main function complexity by moving stat calculations related to percent cover to functions in utils
@perrygeo - let me know if you still have concerns regarding the percent_cover_scale or rebin_sum function (or anything else)
Also, I would definitely be interested in checking out a wrapper of what @dbaston has put together
triggering CI rebuild
Hi all - just checking on the status here. This seems to have stalled more than once and I'd like to see if we can help move it forward, if at all possible.
I hate to do the annoying whats-the-status-of-this-PR, but here I am :) Is there some known big caveat before considering attempting to use this change?
@akrherz I haven't tested it extensively but I believe, from a correctness-of-output perspective, you can rely on these changes. You could help by validating the results and posting here.
The remaining work concerns mitigating performance impacts, documentation, API changes, etc before creating a release out. Nothing too significant but I simply don't have the time to do the work. If you can use this branch instead of an official release, please do and let us know how it works for you.
@sgoodm, I'm geo-spatial noob, but before I saw your approach, to solve weighted extract, I saw @dbaston exact approach, that is different to yours that produce estimate weight. I took an idea from the second answer to this question How does QGIS Zonal Statistics handle partially overlapping pixels? which is a simple approach of vecotrizing each pixel and intersect with the overlay polygon, dividing areas will result for the fraction coverage of that pixel dividing that with sum of fractions will result with the weight of that pixel/poly part. I was wondering if this approach has flaws?
@davidabek1 - vectorizing for exact intersections will not have any flaws in terms of accuracy, but will be more costly in terms of computations. For a few small geometries, like in that Stack question, the additional computation probably won't matter to you, but if you wanted to get exact coverage for many large geometries then the cost of those vector based intersections over many pixels adds up.
@davidabek1 @sgoodm The approach of https://github.com/isciences/exactextract is equivalent to the Stack Exchange link; the difference is that exactextract does the vector-based intersections in a way that takes advantage of the fact that the raster isn't an arbitrary polygon, it's a grid. Done this way there is no performance penalty for the exact calculation. In many cases it is faster because of all of the avoided point-in-polygon tests.