geo icon indicating copy to clipboard operation
geo copied to clipboard

simplify_vw_preserve causing self-intersecting ring

Open audunska opened this issue 1 year ago • 12 comments

We have some geometry code which would benefit from avoiding self-intersections after simplification. I tried to switch from the rdp algorithm to the simplify_vw_preserve function, and wrote some property tests. After fuzzing a bit, quickcheck came up with this counterexample:

use geo::polygon;
use geo::SimplifyVwPreserve;
let pol = polygon![
    (x: 1., y: 4.),
    (x: 3., y: 4.),
    (x: 1., y: 1.),
    (x: 7., y: 0.),
    (x: 1., y: 0.),
    (x: 0., y: 1.),
    (x: 1., y: 4.),
];
let simplified = pol.simplify_vw_preserve(&2.25);
assert_eq!(simplified, polygon![
    (x: 1., y: 4.),
    (x: 3., y: 4.),
    (x: 1., y: 1.),
    (x: 7., y: 0.),
    (x: 1., y: 0.),
    (x: 1., y: 4.),
]);

This simplified object has a self-intersection at (x: 1., y: 1.).

Tested on geo version 0.25 and 0.26.

audunska avatar Aug 02 '23 13:08 audunska

If I cyclically permute the coordinates, there is no self-intersection. So this is probably related to handling of the exterior being a linear ring.

Edit: Nope, one more cyclic permutation makes the issue reappear. I think the first cyclic permutation just made the algorithm not consider that particular triangle in question. So the bug is in the algorithm itself.

audunska avatar Aug 02 '23 13:08 audunska

I guess the cartesian_intersect function does not detect intersections with endpoints of lines?

audunska avatar Aug 03 '23 09:08 audunska

Thanks for catching the possible source of the problem @audunska. I'll take a look as soon as I can.

urschrei avatar Aug 03 '23 10:08 urschrei

Do you have a specific example of an endpoint intersection detection miss? I'd like to test it against some variations in the algorithm.

urschrei avatar Aug 03 '23 13:08 urschrei

You can use the points from the simplified geometry above:

l1 = Line::new((1., 1.), (7., 0.));
l2 = Line::new((1., 0.), (1., 4.));
assert!(cartesian_intersects(l1.start_point(), l1.end_point(), l2.start_point(), l2.end_point()));

The point (1., 1.) of the first line touches the second line.

audunska avatar Aug 04 '23 07:08 audunska

On a side note, I ended up implementing the algorithm described in this paper instead, but optimized using an r* tree in the same way as SimplifyVwPreserve: https://doi.org/10.1016/j.cageo.2013.08.011

Would there be interest in contributing that implementation, maybe?

audunska avatar Aug 04 '23 07:08 audunska

I've tuned the algorithm a bit to:

  • be more aggressive in deciding what is an intersection, by using the standard geo intersection check for new lines and candidate existing lines.
  • Reduced the minimum points from 6 to 5, as using 6 was causing your example to exit early while the polygon was in an invalid state. I suspect we might still run into this problem, but I'm not sure it can be completely fixed without implementing some sort of full "rewind" if we hit the minimum point limit while the polygon is invalid.

Could you use my branch (https://github.com/urschrei/geo/tree/main) for your prop tests? I'm not 100% sure if this will work as a fix, but let's see.

In answer to your topology-preserving RDP impl: yes, we'd be delighted with a contribution!

urschrei avatar Aug 07 '23 14:08 urschrei

(My fix passes our existing tests and your specific test, to be clear)

urschrei avatar Aug 07 '23 14:08 urschrei

Got a failure with this linestring on your branch:

let ls = line_string![
    (x: -4., y: 3.),
    (x:  4., y: 3.),
    (x:  1., y: 2.),
    (x: 20., y:-3.),
    (x:  0., y:-3.),
    (x:  0., y: 2.),
    (x: -4., y: 3.),
];
let expected_simplified = line_string![
    (x: -4., y: 3.),
    (x: 20., y:-3.),
    (x:  0., y:-3.),
    (x:  0., y: 2.),
    (x: -4., y: 3.),
];
assert_eq!(ls.simplify_vw_preserve(&6.25), expected_simplified);

audunska avatar Aug 08 '23 13:08 audunska

As far as I can see from testing, the current topology-preserving algorithm can't deal with this case:

The topology-preserving logic tries to fix intersections by removing the previous point (p -1) in the line if possible if p causes an intersection. However, in this case that doesn't work because the point causing the intersection (1, 2) which has index 2 has two adjacent triangles it can check for removal candidates: (-1, 0, 3) and (0, 3, 4). The first can't be used because index 0 is an edge, and the second is skipped because its index (3) is higher than the current point's (2), which means it's the next point, not the previous one.

But even if that weren't the case, the "problem" retained point has index 5, and the simplification algorithm never looks at it because its triangle (4, 5, 6) area is too large (10). This is why this input also fails using the standard VW algorithm.

JTS outputs the correct LineString (0, 3, 4, 6) for that tolerance (tested using TestBuilder), but I don't know how its topology-preservation logic is implemented…

Update: JTS uses epsilon ^ 2 in its VW algorithm, which led me to think that its topology-preservation logic was being used, when in fact it was simply removing the vertex because the triangle area is below the tolerance (39.0625). GEOS and PostGIS produce the same valid linestring / invalid polygon as geo.

urschrei avatar Aug 09 '23 16:08 urschrei

Thinking about this more, the reason the preservation process is failing in the latest case is because of the rule preventing removal of a triangle that forms part of the geometry's "edge". If this case were part of a larger geometry (i.e. there are non-removed points preceding index 0 (Point(-4, 3)), that stopping rule wouldn't be applied and Point(20, -3) would be removed, removing that self-intersection.

As quickcheck tries to find the most simple counter-examples it can, it's going to continue trying small geometries, which are simply far more likely to hit a stopping rule than larger "real world" geometries. This particular case can't be fixed – there aren't enough points left to remove when the intersection occurs.

urschrei avatar Aug 09 '23 21:08 urschrei

Sure! I guess it just means that this algorithm tries hard to preserve topology, but can't quite guarantee it. Should probably document that fact :)

audunska avatar Aug 10 '23 07:08 audunska