geo icon indicating copy to clipboard operation
geo copied to clipboard

Large geometry intersection check perf speedup

Open urschrei opened this issue 3 years ago • 15 comments

Above some threshold, it may be faster to load the geometry or geometries into an R*-star tree and query for a subset of intersection candidates.

This is a draft PR because:

  • I haven't benchmarked the tree-query logic in order to figure out even an approximate value for MAX_NAIVE_SEGMENTS

  • I haven't figured out whether checking the number of segments in order to decide to switch to trees is an unacceptable perf hit

  • [x] I agree to follow the project's code of conduct.

  • [ ] I added an entry to CHANGES.md if knowledge of this change could be valuable to users.


urschrei avatar Dec 22 '21 12:12 urschrei

Screenshot 2021-12-22 at 13 31 36

This is with length checking turned on, RTree (old) vs naïve (new) polygon-line intersection check using the louisiana.rs text fixture (1350 vertices). We're going to need a bigger polygon.

urschrei avatar Dec 22 '21 13:12 urschrei

Same test setup (RTree is old, naïve is new), using norway_main.rs (8534 vertices) Screenshot 2021-12-22 at 13 46 05

urschrei avatar Dec 22 '21 13:12 urschrei

Same test setup (RTree is old, naïve is new), using a new big.rs (100610 vertices) valid polygon: Screenshot 2021-12-22 at 15 37 19

If I'm interpreting this correctly:

  1. Using a tree for a single Polygon-line intersection check probably isn't worth it for many applications
  2. The length checks aren't significant in wall-clock terms
  3. The existing logic is very fast in wall-clock terms!

Now, on to Polygon-Polygon intersection checking…

urschrei avatar Dec 22 '21 16:12 urschrei

Now we're talking:

Old (red, naïve) vs new (blue, tree) (sorry for reversing the setup this time!) Polygon-Polygon intersection check using big.rs and norway_main.rs (100610 and 8534 vertices, respectively)

Screenshot 2021-12-22 at 17 53 34

The naïve version has to potentially do ~860 million line-line intersection checks so 165 ms isn't bad, but the R*-tree mean is only 35.4 ms

urschrei avatar Dec 22 '21 18:12 urschrei

Cool!

Just as a reminder, @rmanoka did a somewhat related investigation of intersection perf here: https://github.com/georust/geo/issues/649

It's a slightly different use case, but the findings might be relevant. In particular, I sort of expect the performance of R-Tree to be better than the naive approach as a.segments() * b.segements() scales.

So if you did want to have a threshold to toggle between "naive" vs. "rtree" using a multiplicative, rather than an additive might make more sense.

if poly_a.num_segments() * poly_b.num_segements() > x {
    use_rtree()
} else {
    use_naive()
}

michaelkirk avatar Dec 22 '21 18:12 michaelkirk

Yeah, the multiplicative is definitely better. Next up, I need to figure out a way of getting the large polygon data into the benchmark; include!ing it is taking about 45 mins to compile, and it's only around 2.8 mb.

urschrei avatar Dec 22 '21 18:12 urschrei

Maybe https://github.com/georust/geo/issues/566?

(lol not to just dump my wishlist on you or anything...)

michaelkirk avatar Dec 22 '21 18:12 michaelkirk

Hmm as constructed intersection_candidates_with_other_tree isn't actually producing any candidates for our (known-to-intersect) test data. I wonder what's going on…

urschrei avatar Dec 23 '21 15:12 urschrei

Update: we're checking for line intersections in this case, since we're decomposing the polygons into lines. But that excludes the possibility of polygon a completely enclosing polygon b (e.g. large square with smaller square inside). 🤔

urschrei avatar Dec 23 '21 16:12 urschrei

If / when #351 lands, the easier case (A contains B) will be solved, but the more tricky case (A contains B inside one of its holes) remains.

urschrei avatar Dec 23 '21 17:12 urschrei

Now that https://github.com/georust/geo/pull/829 is merged, we might get comparable perf via something like a.relate(b).is_intersects()

michaelkirk avatar Jun 16 '22 02:06 michaelkirk

a.relate(b).is_intersects()

relate requires a GeoFloat bound (currently GeoNum), and from a first cut, use of relate introduces extreme perf regressions (900 %+ in some case) in our nice new boolean ops intersection benchmark suite…

urschrei avatar Jun 25 '22 10:06 urschrei

from a first cut, use of relate introduces extreme perf regressions (900 %+ in some case) in our nice new boolean ops intersection benchmark suite…

Do you still have an example of the data you were using to see this kind of performance?

I'd expect the R-Tree approach to be a constant factor slower, but better asymptotically — much better for big inputs and a little slower for small inputs. I am curious though as to what range of geometry sizes we can expect to see the tradeoffs.

michaelkirk avatar Jul 15 '22 10:07 michaelkirk

If you check out this branch and run the boolean ops benchmarks you should see the perf regression in the criterion output. I should also note that this was a quick check that I didn't dig into, so I may have messed something up.

urschrei avatar Jul 15 '22 11:07 urschrei

(I rebased against main just in case)

urschrei avatar Jul 15 '22 11:07 urschrei