geo icon indicating copy to clipboard operation
geo copied to clipboard

WIP: new sweep-line implementation

Open urschrei opened this issue 8 months ago • 9 comments

  • [x] I agree to follow the project's code of conduct.
  • [ ] I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

This is a from-scratch re-implementation of the Bentley-Ottmann sweep line algorithm, based on the JTS version (subsequently also ported to GEOS), and is intended to replace the https://docs.rs/geo/latest/geo/algorithm/sweep/struct.Intersections.html method.

In terms of performance, it is a lot faster than the existing sweep line impl:

In most cases, the existing impl is 45-50x slower than brute force. This implementation is around 1.3x slower, but for use cases in which you want to find a small number of intersections for a large number of lines (2,000 and up), it is considerably faster. Benchmarks can be run using cargo bench --bench sweep_line_intersection.

I've generated a lot of tests. The methodology for most of these is:

  1. Run the sweep-line algorithm and the brute-force algorithm (line_intersection over all pairs of lines), and compare the output: if they don't match, the test fails.
  2. If we panic, the test also fails.

So far, I haven't been able to produce a panic (but this is unsurprising: there are no unwraps or expects…), and sweep line results match brute-force results.

Discussion

This impl doesn't use SweepPoint, Cross, or PointOrLine: we only accept Line<GeoFloat> segments for checking. If we adopt this, we would either have to do more work to accept "Crossable" geometries etc, or deprecate these (and adapt our InteriorPoint algorithm (which uses SweepPoint internally). That is overwhelmingly my preference.

Feedback wanted

Tests that produce incorrect results. I'm going to add https://github.com/georust/geo/issues/948 if I can, but it's looking good so far. Is there anything in our JTS test suite we can use?

urschrei avatar May 12 '25 13:05 urschrei

Split impl and tests, so the actual algorithm is easier to review.

urschrei avatar May 12 '25 13:05 urschrei

@dabreegster tagging you in case this is useful, both for your own uses and because you might have some big real-world data for testing.

urschrei avatar May 12 '25 17:05 urschrei

Another small cause for optimism: the existing sweep line algorithm fails on test_overlapping_segments:

  stderr ───

    thread 'algorithm::new_sweep::tests::test_overlapping_segments' panicked at geo/src/algorithm/new_sweep/tests.rs:39:5:
    assertion `left == right` failed: Sweep algorithm found 54 intersections but brute force found 50
      left: 54
     right: 50

urschrei avatar May 12 '25 20:05 urschrei

Large Datasets/isolated_brute_force_20000
                        time:   [709.20 ms 709.54 ms 709.93 ms]
                        change: [+0.0083% +0.0594% +0.1199%] (p = 0.07 > 0.05)
                        No change in performance detected.
Large Datasets/isolated_new_sweep_20000
                        time:   [7.3726 ms 7.3972 ms 7.4239 ms]
                        change: [-96.138% -96.124% -96.105%] (p = 0.00 < 0.05)
                        Performance has improved.

We're now two orders of magnitude faster than brute force on large sets with sparse intersections.

urschrei avatar May 12 '25 21:05 urschrei

This is exciting! I might not have time to review it this week, but I'll take a look as soon as I can.

michaelkirk avatar May 12 '25 22:05 michaelkirk

This is exciting! I might not have time to review it this week, but I'll take a look as soon as I can.

No rush really. I'm gonna stop tinkering with it now, and maybe some people will suggest evil test cases.

urschrei avatar May 12 '25 23:05 urschrei

This looks like an incredible improvement! I only seem to have one use of this today: detecting places where OSM road segments cross over each other without a graph connection, aka, finding bridge/tunnel crossing points, to force a planar graph. I have never seen a crash from this, and I've never noticed the speed being an issue. So no evil test cases for you, sorry. https://github.com/a-b-street/ltn/blob/7f0178c34de81f5da1163025d3ebf915448b6222/backend/src/route_snapper.rs#L46

dabreegster avatar May 13 '25 08:05 dabreegster

This looks like an incredible improvement! I only seem to have one use of this today: detecting places where OSM road segments cross over each other without a graph connection, aka, finding bridge/tunnel crossing points, to force a planar graph. I have never seen a crash from this, and I've never noticed the speed being an issue. So no evil test cases for you, sorry. https://github.com/a-b-street/ltn/blob/7f0178c34de81f5da1163025d3ebf915448b6222/backend/src/route_snapper.rs#L46

Ah well, if you have a representative dataset you use for that, lmk!

urschrei avatar May 13 '25 10:05 urschrei

Benchmarks and correctness looks awesome @urschrei ! I'll try to review the impl. / provide test cases over the month if time permits, but looks good to go. I'm okay with deprecating the previous impl. as necessary.

I think only the monotone module uses the prev. impl. It should work in any impl. of sweep line as it already assumes there's no "interior-intersection" among the inputs.

rmanoka avatar May 14 '25 03:05 rmanoka

This impl doesn't use SweepPoint, Cross, or PointOrLine: we only accept Line<GeoFloat> segments for checking. If we adopt this, we would either have to do more work to accept "Crossable" geometries etc, or deprecate these (and adapt our InteriorPoint algorithm (which uses SweepPoint internally). That is overwhelmingly my preference.

If I understood you correctly, I think you meant applying your new sweep line implementation to interior_point. I had a go at that here: 25255423c7114d8b60888d9edc13d0f763b7d0a0

Since it seems like this new implementation might be less prone to failure, I am in favor of using it internally everywhere, unless there's some reason not to.

I'd love to see this move forward - anything I can do to help?

michaelkirk avatar Jul 01 '25 22:07 michaelkirk

I've squashed 2525542 into my HEAD to have a look (you should be able to rebase against ee8b99a to produce the same result). The problem we have now is that our monotone module makes heavy use of the deprecated sweep line module, producing a raft of clippy failures. I don't know anything about that module, but I'm pretty sure we can't swap in the new sweep-line algorithm – the existing impl exposes event data and other things that mine doesn't (and can't easily)

urschrei avatar Jul 02 '25 19:07 urschrei

We don't necessarily have to deprecate the old sweep algorithm at this point. I did that in my commit mostly just to identify where all it was being used while developing.

That said, my understanding is that this new sweep implementation is strictly better: very similar from a correctness standpoint and it avoids a class of panics which is great. Addressing panics we know users are hitting is my main interest in moving this forward.

So in that case, it feels weird to offer users the "worse" legacy sweep algorithms. My instinct is that we should just replace the public interface of the old one with the new one. This would be a breaking change, since the API's are slightly different, but that seems worthwhile.

As for the monotone functionality: how about maintaining the old sweep impl, but only as a private implementation detail of Monotone?

That would let us move forward with fixing crashes without losing functionality.

What do you think? Is it important to maintain the old sweep API for other reasons? Or any alternative proposals?

michaelkirk avatar Jul 03 '25 21:07 michaelkirk

Is it important to maintain the old sweep API for other reasons?

My concern was purely about deprecating a big module I didn't write!

My instinct is that we should just replace the public interface of the old one with the new one. This would be a breaking change, since the API's are slightly different, but that seems worthwhile.

Good plan. Done.

As for the monotone functionality: how about maintaining the old sweep impl, but only as a private implementation detail of Monotone?

Creatively renamed to old_sweep, annotated with a warning, and made pub(crate). I can move it fully into the monotone module if it's desired.

Note that the new implementation no longer offers SweepPoint which was a Point with full lexicographic sort capabilities (with the baked-in assumption that the underlying float values would be sortable? dot dot dot etc)

In any case, I had a look at interior_point and I don't think we need a full lexicographic sort, as per my comment, so that should be fine.

urschrei avatar Jul 04 '25 13:07 urschrei

The two new commits are meaningfully separated for (hopefully) ease of review.

urschrei avatar Jul 04 '25 13:07 urschrei

…and finally, a less messy (IMO) benchmark setup, and I managed to replace the HashMap with a lookup, which buys us another 20 % speedup or so on average (sorry!).

urschrei avatar Jul 04 '25 21:07 urschrei

Wondering whether it's sensible to document the presence of this module in the line_intersection module docs: sweep isn't very intuitive, and it's already slightly confusing to have line_intersection and intersects modules (nb I'm not proposing to change that, just wondering how best to point someone at the module that best fits their needs)

urschrei avatar Jul 09 '25 12:07 urschrei

Wondering whether it's sensible to document the presence of this module in the line_intersection module docs: sweep isn't very intuitive, and it's already slightly confusing to have line_intersection and intersects modules (nb I'm not proposing to change that, just wondering how best to point someone at the module that best fits their needs)

Great point. I think more people would be be looking for something like "intersection" rather than "sweep" (which is an implementation detail). Cross linking from these other modules as you suggested seems like a good start.

michaelkirk avatar Jul 10 '25 22:07 michaelkirk

I'm not sure if you're still actively pushing code, so tag me when you're ready for re-review @urschrei

michaelkirk avatar Jul 10 '25 22:07 michaelkirk

OK, have a (hopefully final) look @michaelkirk

urschrei avatar Jul 10 '25 23:07 urschrei

Looks good! I'll leave it to you to merge it @urschrei

michaelkirk avatar Jul 14 '25 17:07 michaelkirk

This impl doesn't use SweepPoint, Cross, or PointOrLine: we only accept Line<GeoFloat> segments for checking. If we adopt this, we would either have to do more work to accept "Crossable" geometries etc

I'm adapting some code to use the new implementation, and hitting a slight snag here. It's sometimes useful to associate arbitrary data with the lines that intersect. As a workaround, I'm hashifying the input lines (by picking a suitable precision and turning f64 into isize), then doing the lookup with the output Lines.

dabreegster avatar Jul 17 '25 12:07 dabreegster

Thanks for being a beta tester 😉

I think we assumed no one was using those traits except us. Now that we know that's not the case, I can follow up to try to make it generic like the old implementation.

michaelkirk avatar Jul 17 '25 19:07 michaelkirk

Thanks for pushing through a newer implementation that doesn't crash!

The workaround is not bad: https://github.com/a-b-street/ltn/commit/6b233ee90b04727267143f32ad4b2edd9dc22cda#diff-94c2fcfb4730187542a1dec886f4b03d40929ac66fdf6f1735d5bb6df7c91a50

The simplest API might be to return the index of the matching lines, ie, ((Line, usize), (Line, usize), LineIntersection. The user can then associate data with that index easily while building the input. It's a little less type-safe than plumbing through a generic associated data T with each Line, or letting people implement a trait, but it would work totally fine and might (totally speculating, haven't been following the new perf improvements PR) be cheaper memory-wise.

dabreegster avatar Jul 18 '25 09:07 dabreegster