h3 icon indicating copy to clipboard operation
h3 copied to clipboard

h3_polygon_to_cells: Handling of invalid geometry: 5 point geometry = 2,364,092 hex_ids

Open jmealo opened this issue 1 year ago • 0 comments

Hello,

Some invalid geometries have made it into our dataset and are wreaking a bit of havoc in our H3 indexes.

  • h3_polygon_to_cells in the latest version of h3 returns 2,364,092 hex_ids for one of them.
  • These invalid geometries have 5 points, two points appear multiple times.
  • The geometry starts and ends at the same point.
  • We're using the latest h3-pg which uses h3 4.x
  • ST_MakeValid() function turns these into a bow-tie shape.
  • ST_IsValid() returns false.

I'm not sure if h3 should handle any of this or if we should always validate geometry before calling into it. Any thoughts?

Additional context: We're using a flood fill strategy to convert polygons to h3 cells (the results of which are similar to Snowflake's h3_coverage). This strategy results in 50% less hex_ids on invalid inputs. This was still way too many, and before we found the root cause, we decided to test our implementation against h3_polygon_to_cells,. We were very surprised to see that we got double the hex_ids.

Test case:

{
  "type": "Polygon",
  "coordinates": [
    [
      [-148.5, 29.1],
      [-148.5, 63.9],
      [-72.5, 29.1],
      [-72.5, 63.9],
      [-148.5, 29.1]
    ]
  ]
}
WITH h3_cells AS (
  SELECT h3_polygon_to_cells(
    ST_GeomFromGeoJSON('{
      "type": "Polygon",
      "coordinates": [
        [
          [-148.5, 29.1],
          [-148.5, 63.9],
          [-72.5, 29.1],
          [-72.5, 63.9],
          [-148.5, 29.1]
        ]
      ]
    }'),
    7
  ) AS cells
)
SELECT COUNT(1) 
FROM h3_cells;
# count = 2,364,092

jmealo avatar Oct 10 '24 17:10 jmealo