Large geometry intersection check perf speedup
Above some threshold, it may be faster to load the geometry or geometries into an R*-star tree and query for a subset of intersection candidates.
This is a draft PR because:
-
I haven't benchmarked the tree-query logic in order to figure out even an approximate value for
MAX_NAIVE_SEGMENTS -
I haven't figured out whether checking the number of segments in order to decide to switch to trees is an unacceptable perf hit
-
[x] I agree to follow the project's code of conduct.
-
[ ] I added an entry to
CHANGES.mdif knowledge of this change could be valuable to users.
This is with length checking turned on, RTree (old) vs naïve (new) polygon-line intersection check using the louisiana.rs text fixture (1350 vertices). We're going to need a bigger polygon.
Same test setup (RTree is old, naïve is new), using norway_main.rs (8534 vertices)

Same test setup (RTree is old, naïve is new), using a new big.rs (100610 vertices) valid polygon:

If I'm interpreting this correctly:
- Using a tree for a single Polygon-line intersection check probably isn't worth it for many applications
- The length checks aren't significant in wall-clock terms
- The existing logic is very fast in wall-clock terms!
Now, on to Polygon-Polygon intersection checking…
Now we're talking:
Old (red, naïve) vs new (blue, tree) (sorry for reversing the setup this time!)
Polygon-Polygon intersection check using big.rs and norway_main.rs (100610 and 8534 vertices, respectively)
The naïve version has to potentially do ~860 million line-line intersection checks so 165 ms isn't bad, but the R*-tree mean is only 35.4 ms…
Cool!
Just as a reminder, @rmanoka did a somewhat related investigation of intersection perf here: https://github.com/georust/geo/issues/649
It's a slightly different use case, but the findings might be relevant. In particular, I sort of expect the performance of R-Tree to be better than the naive approach as a.segments() * b.segements() scales.
So if you did want to have a threshold to toggle between "naive" vs. "rtree" using a multiplicative, rather than an additive might make more sense.
if poly_a.num_segments() * poly_b.num_segements() > x {
use_rtree()
} else {
use_naive()
}
Yeah, the multiplicative is definitely better. Next up, I need to figure out a way of getting the large polygon data into the benchmark; include!ing it is taking about 45 mins to compile, and it's only around 2.8 mb.
Maybe https://github.com/georust/geo/issues/566?
(lol not to just dump my wishlist on you or anything...)
Hmm as constructed intersection_candidates_with_other_tree isn't actually producing any candidates for our (known-to-intersect) test data. I wonder what's going on…
Update: we're checking for line intersections in this case, since we're decomposing the polygons into lines. But that excludes the possibility of polygon a completely enclosing polygon b (e.g. large square with smaller square inside). 🤔
If / when #351 lands, the easier case (A contains B) will be solved, but the more tricky case (A contains B inside one of its holes) remains.
Now that https://github.com/georust/geo/pull/829 is merged, we might get comparable perf via something like a.relate(b).is_intersects()
a.relate(b).is_intersects()
relate requires a GeoFloat bound (currently GeoNum), and from a first cut, use of relate introduces extreme perf regressions (900 %+ in some case) in our nice new boolean ops intersection benchmark suite…
from a first cut, use of relate introduces extreme perf regressions (900 %+ in some case) in our nice new boolean ops intersection benchmark suite…
Do you still have an example of the data you were using to see this kind of performance?
I'd expect the R-Tree approach to be a constant factor slower, but better asymptotically — much better for big inputs and a little slower for small inputs. I am curious though as to what range of geometry sizes we can expect to see the tradeoffs.
If you check out this branch and run the boolean ops benchmarks you should see the perf regression in the criterion output. I should also note that this was a quick check that I didn't dig into, so I may have messed something up.
(I rebased against main just in case)