jts RFC: Fixing invalid geometry

trafficstars

It is desirable to have a way to convert an invalid geometry into a valid one that preserves the character of the input (to some extent). This allows the fixed geometry to be used in JTS operations.

See GeometryFixer prototype code.

Goals

Allow converting any geometry to one which is valid, preserving as much as possible of the original appearance

Non-Goals

Preserving all vertices in input - some vertices may be dropped to produce a "natural" repair
Precision reduction - can be done subsequent to processing
Normalization - can be done subsequent to processing
Simplification - can be done before processing

Motivating Cases

Descriptions use following terminology: "touch" means touches in a point, "adjacent" means touches along a line.

See XML file for example data test cases.

Bow-tie polygon
Bow-tie polygon with multiple crossings
Checkerboard polygon
Polygon with zero area and collapses to a line
Polygon with zero area and collapses to a point
Polygon with portion of the shell boundary which collapses to a line (internal or external)
Polygon with portion of the shell boundary which collapses to a line and encloses area
Polygon which overlaps itself
Polygon with linework separating it into two adjacent regions (i.e cutting it in two)
Polygon with collapsed linework that encloses area (i.e. with an inverted boundary that is adjacent)
Polygon with overlapping holes (intersecting in points)
Polygon with overlapping holes (intersecting along edges)
Polygon with adjacent holes
Polygon with holes forming a separated region
Polygon with hole adjacent to shell
Polygon with holes adjacent to shell and separating a region
Polygon with hole overlapping shell
Polygon with hole surrounding shell
Polygon with bowtie hole overlapping shell - intersection inside
Polygon with bowtie hole overlapping shell - intersection outside
Polygon with nested holes - disjoint
Polygon with nested holes - touching
Polygon with nested holes - adjacent
MultiPolygon with overlapping elements
MultiPolygon with adjacent elements
MultiPolygon with polygon with hole overlapping shell and other polygon
GeometryCollection with multiple invalid elements
Line that collapses to a point
Geometry with NaN ordinates

Issues

In PostGIS the semantic is to preserve all input vertices if possible. To do this, collapsed edges in polygonal inputs may result in output containing LineStrings. Is this desirable? Perhaps should be an alternate mode?

Semantics

Result geometry is always a new object
Validity is not checked - input geometry is always processed
- A validity check can be performed by client, if needed
- Valid inputs will still have minor transformations applied
Empty atomic geometries are valid and are returned unchanged (copied)
Remove invalid coordinates (i.e. non-finite X or Y: NaN, Inf, -Inf)
Remove repeated points in sequences
Point - keep valid coordinate, or EMPTY
LineString - fix coordinate list. If left with 1 or 0 coordinates then EMPTY
LinearRing - fix coordinate list. If valid return as a ring, otherwise return as a LineString.
Polygon- transform into a valid polygon, preserving as much of the extent and vertices as possible
MultiPolygon- fix each element, then ensure collection is non-overlapping (via union)
GeometryCollection- fix each element

Polygon Repair

There are two approaches for repairing self-intersecting polygons. Both of them start by forming the fully noded arrangement of the polygon linework.

the Even-Odd strategy treats every delimited region as a potential interior or exterior area, alternating inwards from the exterior.
the Winding Number strategy uses the winding number of each delimited region to determine interior/exterior.
- this can be extended to use both winding number polarities to determine interior. This has the effect of identifying all "lobes" as interior areas.

See Subramaniam for further explanation.

The nominal shell and hole ring(s) are processed separately, and then the results are combined via a different operation to produce the final result.

A further refinement is to identify holes which are disconnected from (outside) the shell, and treat them separately as additional shells. This aligns with the above semantic of treating all linework lobes as polygons interiors.

Processing Options

The following semantic options may be provided as required.

Empty elements in Collections:
- Option 1: (default) remove them
- Option 2: keep them
Collapsed geometry (keepCollapsed)
- Option 1: (default) remove them. Result has same dimension as input.
- Option 2: convert to a representative lower-dimension geometry (note this may result in mixed collections)
Single-element collections
- Option 1: (default) convert to atomic geometry
- Option 2: Keep result as a single-element collection
- LinearRing fixing
- An alternative is to fix linework, node and return all resulting rings.
  - This can be done by converting ring to polygon, fixing polygon, and extracting rings

Prior Art

PostGIS provides ST_MakeValid
GEOS provide a MakeValid capability (although there are some issues with it)
ESRI Java Geometry API makeSimple
Thesis by Subramaniam: PARTITION OF A NON-SIMPLE POLYGON INTO SIMPLE POLYGONS. Advocates using the Non-Zero Winding Number (NZW) approach, which is similar to the JTS algorithm defined here.
Paper (paywalled): A general method for decomposing self-intersecting polygon to normal based on self-intersection points
Boost Geometry Correct - implements the NZW approach in thesis above

Algorithm

Polygons

For each Polygon ring
- Convert self-intersecting ring to a valid set of non-adjacent rings, using the Winding Number of enclosed regions
  - This can be done by combining the results of two buffer(0) operations using opposite choices for winding number polarity
Disconnected outside holes are converted to shells (and do not participate in subsequent hole removal)
Union all repaired holes
Difference holes from original shell Polygon
Combine with converted outside holes

MultiPolygons

Fix each element polygon
Union fixed polygons

Dec 13 '20 20:12 dr-jts

Maybe this paper is worth some consideration : https://agile-online.org/conference_paper/cds/agile_2012/proceedings/papers/paper_ledoux_automatically_repairing_invalid_polygons_with_a_constrained_triangulation_2012.pdf

Dec 15 '20 13:12 mukoki

That's a very interesting paper (especially because it talks about JTS and GEOS!).

However, I'm not convinced that a CDT is the best way to fix invalid polygons. One obvious issue is that it requires writing a CDT algorithm, which is a big undertaking. And it's not clear that the complexity provides any better results than the node-polygonize approach above (and likely has worse performance).

The examples of invalidity in the paper are useful. I wonder if that data is available?

Dec 15 '20 17:12 dr-jts

Didn't JTS already include a CDT algorithm ? There is a repo on github with a few examples of invalid polygons: https://github.com/tudelft3d/prepair

The approach you propose seems not too far from OpenJUMP's MakeValidOp ;-). Seems that you consider that shell vs hole information is reliable. What if you get a Polygon with a hole enclosing the shell. Should the hole "erases" the polygon or should we consider that shell and hole must be swapped ?

Dec 15 '20 19:12 mukoki

Didn't JTS already include a CDT algorithm ?

Nope, it has Conforming DT, which is definitely not what is needed.

The whole CDT approach just seems way more complicated than needed, and I don't think it answers the hard questions about how to create valid polygons.

There is a repo on github with a few examples of invalid polygons: https://github.com/tudelft3d/prepair

Great, could be handy.

The approach you propose seems not too far from OpenJUMP's MakeValidOp ;-).

No doubt!

Seems that you consider that shell vs hole information is reliable. What if you get a Polygon with a hole enclosing the shell. Should the hole "erases" the polygon or should we consider that shell and hole must be swapped ?

Yes, that's a good example. Either could be considered "right". Although it does feel suspect to convert to an empty geometry - that throws away all the input information. I've added this case to the examples list. The approach is definitely subject to change.

There's a similar question about a hole surrounding another hole. Should the inner hole turn into an island, or disappear?

Dec 15 '20 20:12 dr-jts

I have developed a polygon correct approach (remove self-intersection, correct order, correct inners), based on boost geometry: https://github.com/kleunen/boost_geometry_correct

The approach for calculating and removing self-intersection is vastly simpler and more efficient than existing libraries. It was developed for tilemaker: https://github.com/systemed/tilemaker

But can be used for other projects as well.

Jun 13 '21 14:06 kleunen

I have developed a polygon correct approach (remove self-intersection, correct order, correct inners), based on boost geometry: The approach for calculating and removing self-intersection is vastly simpler and more efficient than existing libraries.

Thanks, this looks interesting.

Jun 13 '21 18:06 dr-jts

What is approach that is used now for removing self-intersection from polygons ? I looked through the code, but could not find an approch that is used.

Jun 13 '21 18:06 kleunen

What is approach that is used now for removing self-intersection from polygons ?

The buffer-by-zero heuristic is used to turn self-intersecting rings into valid geometry. Code is here. I this produces a similar semantic to the one advocated in the Boost blog post you referenced.

Jun 13 '21 19:06 dr-jts

Ah yes. It is common and does work well in many practical cases. But in cases with holes. It will not generate the second polygon. The approach i took does generate these multipolygons.

Jun 13 '21 19:06 kleunen

But in cases with holes. It will not generate the second polygon. The approach i took does generate these multipolygons.

Not sure what you mean - do you have an example?

Note the buffer-by-zero call is just the lowest level of GeometryFixer, to fix invalid single rings. For Polygons with holes and MultiPolygons each ring is processed separately, and then recombined using difference and union.

Jun 13 '21 20:06 dr-jts

@kleunen I think GeometryFixer handles the boost_geometry_correct examples in the same way:

POLYGON ((5 0, 2.5 9, 9.5 3.5, 0.5 3.5, 7.5 9, 5 0))

POLYGON ((55 10, 141 237, 249 23, 21 171, 252 169, 24 89, 266 73, 55 10))

POLYGON ((0 0, 10 0, 0 10, 10 10, 0 0, 5 0, 5 10, 0 10, 0 5, 10 5, 10 0, 0 0))

Jun 13 '21 20:06 dr-jts

Yes that is the same. I still have to add rebuilding with difference and union. Boost geometry buffer does not allow buffer with 0. So maybe there is difference in implementation there.

Jun 13 '21 20:06 kleunen

Boost buffer add many new points. It seems jts buffer does not. Or is simplification run afterwards?

Jun 13 '21 20:06 kleunen

Boost geometry buffer does not allow buffer with 0. So maybe there is difference in implementation there.

Using buffer-by-zero was just a quick way to get this functionality rolled out (since it is available in JTS). You could duplicate that semantic by using a node-and-build-topology approach, of course. But then you have to implement the noding and topology graph building code, which is non-trivial.

In fact I would prefer the GeometryFixer implementation to not depend on buffer, since that introduces substantial complexity and might cause breakages at some point (although buffer has been stable for a long time). But that's a fairly major project, so not sure when it will get done.

Jun 13 '21 20:06 dr-jts

Boost buffer add many new points. It seems jts buffer does not. Or is simplification run afterwards?

Do you mean repeated points? If so, that's a result of de-duplication in the buffer and overlay functions.

Jun 13 '21 20:06 dr-jts

Here you can find discussion regarding dissolve/make_valid approach for tilemaker: https://github.com/systemed/tilemaker/issues/246

No, with buffer i get many additional points around the corners. To "round" them. For every corner on polygon you get 3 dots, maybe this is implementation of boost geometry. JTS implementation probably more efficient, generating less support points, because these additional points are not needed.

Jun 13 '21 20:06 kleunen

No, with buffer i get many additional points around the corners. To "round" them. For every corner on polygon you get 3 dots, maybe this is implementation of boost geometry. JTS implementation probably more efficient, generating less support points, because these additional points are not needed.

Yes, that must be Boost geometry buffer behaviour. In the JTS buffer algorithm, buffer distance = 0 is treated as a special case, and the original geometry is used as the offset curve with no other modification. (In contrast, a buffer distance >0 invokes several heuristics to simplify the input linework, and then generates buffer fillets and endcaps as per the input parameters).

Jun 13 '21 21:06 dr-jts

@kleunen One question I have about your algorithm (which I presume also applies to the papers mentioned, although I haven't verified that): how robust is it? Numerical precision limitations can cause nodes computed in a line arrangement to "jump" across one another, which leads to inconsistent topology. JTS contains a lot of code to handle this situation (in particular, buffer checks for consistent noding and falls back to using snap-rounding if it occurs).

However, perhaps this is a case where the effects aren't so severe - although I would be surprised if that was the case.

Unfortunately it's hard to reproduce robustness issues, and they tend to manifest differently for each individual algorithm. Generally the only way to find them is by running many real-world examples. And to detect failure cases, you need an independent way to verify the correctness of computed outputs - which is usually not easy to create. For this fix-invalid-geometry situation, you would need a way not only to verify that the computed result is valid, but that it's valid in a sensible way, according to the desired semantics of the algorithm (since otherwise trivial or wildly malformed output that happened to be valid would be accepted).

Jun 13 '21 22:06 dr-jts

There are no robustness analysis in the papers. The approach basicly reuses the existing points on the ring and add additional points on the ring on the intersections.

One thing i noticed you start traversing this ring and then follow the intersections to detect the non intersecting sub polygons. When doing this, you have to detect you are back at the starting point. For this i used a distance between the current point and the start point. If you do not detect this case, the algorithm keeps traversing the same ring infinitly.

https://github.com/kleunen/boost_geometry_correct/blob/main/correct.hpp#L230

Also, there is no proof that by generating these pseudo vertices you can follow the subrings and always end up at the starting point. It is like you says. It works in practice but it is not proven you can not end up in a subring which does not have the starting point. And thus start looping infinitly.

Maybe this will make the algorithm more robust. Instead of detecting you are returned at the starting point, you should stop following the ring if you return to a point you already visited.

Moreover duplicate points should be filteren out. But this is same for other algorithms.

Jun 14 '21 06:06 kleunen

In the case of multiple intersections at the same point, there is a choice to be made to which edge to follow, see below:

On the left case, I start at the intersection which was the first on the original ring, and then I follow the intersection which leads back the fastest to the end of the original ring. This leads back to the starting point.

On the right case, i take an edge to a different sub-ring. Getting stuck in this top-left ring, never being able to return to the original ring and starting point, without following a self-intersecting point.

There is no proof, why the first approach always works and why i do not need to perform an exhaustive search at an intersection to find out which route to take to get on the ring which leads back to the starting point. I think this is because if you look at where the intersections are on the position of the original ring, they are actually nested, the subrings.

Jun 14 '21 06:06 kleunen

Also note I calculate "where" the intersections are on the original edge of the ring: https://github.com/kleunen/boost_geometry_correct/blob/main/correct.hpp#L118-L119

This is used to order them such, that they are ordered according to how they appear on the original ring: https://github.com/kleunen/boost_geometry_correct/blob/main/correct.hpp#L58-L71

So possibly, numerical stability might cause two intersection to swap places, but I think this is not an issue, because they are ordered according to the comparable distance on the original edge.

Jun 14 '21 06:06 kleunen

Validating the output is indeed laborious. If the algorithms drops an invalid inner or all inners or returns an empty polygon or multi polygon. The geometry is valid, but clearly is not acceptable. But how many nodes and inners should be in the output? You can not really calculate and automatically test. But i guess you are very familiar with these difficulties.

Jun 14 '21 10:06 kleunen

So possibly, numerical stability might cause two intersection to swap places, but I think this is not an issue, because they are ordered according to the comparable distance on the original edge.

The core problem is that the interesection point of two nearly-collinear lines might be computed to be the opposite side of another line which crosses the two lines. This leads to the computed noded line arrangement still having non-noded intersections. The only solution I've found to handle this is to introducing snapping into the noding subsystem.

Jun 14 '21 21:06 dr-jts

Validating the output is indeed laborious. If the algorithms drops an invalid inner or all inners or returns an empty polygon or multi polygon. The geometry is valid, but clearly is not acceptable. But how many nodes and inners should be in the output? You can not really calculate and automatically test. But i guess you are very familiar with these difficulties.

One way to validate is to use a "point probe" algorithm, which is completely different and simpler, and hence a good guarantee of correctness. This involves generating points near every section of input and output linework, and then determining whether they are correctly placed relative to the original linework. If the Non-Zero Winding number rule is used it should be possible to determine the winding number for a point and then confirm it has an appropriate value. This is compute-intensive but relatively straightforward - which computers are good at!

Jun 14 '21 21:06 dr-jts

So possibly, numerical stability might cause two intersection to swap places, but I think this is not an issue, because they are ordered according to the comparable distance on the original edge.

The core problem is that the interesection point of two nearly-collinear lines might be computed to be the opposite side of another line which crosses the two lines. This leads to the computed noded line arrangement still having non-noded intersections. The only solution I've found to handle this is to introducing snapping into the noding subsystem.

Yes i think the approach in boost geometry is to simply ignore this. Going the approach that double float or even int have a limit on precision and near this limit issues might occur. The user can always pre snap the geometry before applying a make valid if he/she needs this precision.

But i guess that is a matter of preference. This can occur clearly, because the intersection detection is needed. So if you need this to work correctly, you can pre-snap the nodes. But once the intersections are detected, you only need to detect if you are returned to the starting point when walking the sub-ring. For this, you can use a threshold as well.

Jun 15 '21 05:06 kleunen

The NZW or odd-even can be configured/adjusted. The inner rings are detected, i just convert the orientation before performing the union. The ring detection generates three rings, which need to be combined:

1 = CW, 2 = CW, 3 = CCW

I now apply NZW:

Jun 15 '21 07:06 kleunen

i have an multipolygon, wkt like this : MULTIPOLYGON (((104.84831240300008 26.577005471000064, 104.84889922500008 26.57747835400005, 104.84948523000008 26.576550350000048, 104.84877030700005 26.57614817800004, 104.84817763600006 26.577031617000046, 104.84889922500008 26.57747835400005, 104.84831240300008 26.577005471000064, 104.84886598700007 26.57734819600006, 104.84935045000009 26.57658099500003, 104.84880356000008 26.576273349000076, 104.84831240300008 26.577005471000064)))

when i use GeometryFixer, it bacame to this : POLYGON ((104.84889922500008 26.57747835400005, 104.84948523000008 26.576550350000048, 104.84877030700005 26.57614817800004, 104.84817763600006 26.577031617000046, 104.84889922500008 26.57747835400005))

while use postgis, it bacame to this:

GEOMETRYCOLLECTION (POLYGON ((104.84948523000008 26.576550350000048, 104.84877030700005 26.57614817800004, 104.84817763600006 26.577031617000046, 104.84889922500008 26.57747835400005, 104.84948523000008 26.576550350000048), (104.84880356000008 26.576273349000076, 104.84935045000009 26.57658099500003, 104.84886598700007 26.57734819600006, 104.84831240300008 26.577005471000064, 104.84880356000008 26.576273349000076)), LINESTRING (104.84831240300008 26.577005471000064, 104.84889922500008 26.57747835400005))

i saw this issue: In PostGIS the semantic is to preserve all input vertices if possible. To do this, collapsed edges in polygonal inputs may result in output containing LineStrings. Is this desirable? Perhaps should be an alternate mode?

for me , i think the one which postgis maked is correct. and an alternate mode will make it more adaptable in a lot of scenarios.

Nov 03 '22 02:11 13661172102

jts jts copied to clipboard

RFC: Fixing invalid geometry

Goals

Non-Goals

Motivating Cases

Issues

Semantics

Polygon Repair

Processing Options

Prior Art

Algorithm

Polygons

MultiPolygons

jts
jts copied to clipboard