h3
h3 copied to clipboard
h3_polygon_to_cells: Handling of invalid geometry: 5 point geometry = 2,364,092 hex_ids
Hello,
Some invalid geometries have made it into our dataset and are wreaking a bit of havoc in our H3 indexes.
-
h3_polygon_to_cellsin the latest version ofh3returns 2,364,092 hex_ids for one of them. - These invalid geometries have 5 points, two points appear multiple times.
- The geometry starts and ends at the same point.
- We're using the latest
h3-pgwhich usesh34.x -
ST_MakeValid()function turns these into a bow-tie shape. -
ST_IsValid()returns false.
I'm not sure if h3 should handle any of this or if we should always validate geometry before calling into it. Any thoughts?
Additional context:
We're using a flood fill strategy to convert polygons to h3 cells (the results of which are similar to Snowflake's h3_coverage). This strategy results in 50% less hex_ids on invalid inputs. This was still way too many, and before we found the root cause, we decided to test our implementation against h3_polygon_to_cells,. We were very surprised to see that we got double the hex_ids.
Test case:
{
"type": "Polygon",
"coordinates": [
[
[-148.5, 29.1],
[-148.5, 63.9],
[-72.5, 29.1],
[-72.5, 63.9],
[-148.5, 29.1]
]
]
}
WITH h3_cells AS (
SELECT h3_polygon_to_cells(
ST_GeomFromGeoJSON('{
"type": "Polygon",
"coordinates": [
[
[-148.5, 29.1],
[-148.5, 63.9],
[-72.5, 29.1],
[-72.5, 63.9],
[-148.5, 29.1]
]
]
}'),
7
) AS cells
)
SELECT COUNT(1)
FROM h3_cells;
# count = 2,364,092