mosaic icon indicating copy to clipboard operation
mosaic copied to clipboard

grid_tessellate fails with out of bound error if supplied geometry is < quarter of grid cell size near index edge

Open bransonf opened this issue 1 year ago • 1 comments

The grid_tessellate function produces an error java.lang.IllegalStateException: X coordinate (210.0) out of bounds 0-200 originating from CustomIndexSystem.pointToIndex() when the supplied geometry is within a quarter of the configured cell size. (cell_size/4)

To Reproduce Assuming an environment initialized as: (Note: JTS/ESRI geometry API produce same result)

%python

xmin = 0
xmax = 200
ymin = 0
ymax = 200
grid_size = 20
splits = 2
custom_index_def = f"CUSTOM({xmin},{xmax},{ymin},{ymax},{splits},{grid_size},{grid_size})"
spark.conf.set("spark.databricks.labs.mosaic.index.system", custom_index_def)

import mosaic
mosaic.enable_mosaic(spark, dbutils)

boundary_geom = "POLYGON ((200 0, 200 200, 0 200, 0 0, 200 0))"
test_geom = "POLYGON ((195 180, 195 195, 180 195, 180 180, 195 180))"
test_geom2 = "POLYGON ((196 180, 196 195, 180 195, 180 180, 196 180))"

Both geoms are obviously within the index:

%python

spark.sql(f"""
 SELECT st_contains(
   '{boundary_geom}',
   geoms
 ) as within_index
 FROM VALUES
 ('{test_geom}'),
 ('{test_geom2}') AS data(geoms)
""").show()
+------------+
|within_index|
+------------+
|        true|
|        true|
+------------+

grid_polyfill() produces the correct result

%python

spark.sql(f"""
 SELECT grid_polyfill(
   geoms, 0
 ) as polyfill
 FROM VALUES
 ('{test_geom}'),
 ('{test_geom2}') AS data(geoms)
""").show()

+--------+
|polyfill|
+--------+
|    [99]|
|    [99]|
+--------+

grid_tessellate() works for the first test_geom

spark.sql(f"""
SELECT grid_tessellate('{test_geom}', 0) as tessellate
""").show()
+--------------------+
|          tessellate|
+--------------------+
|{[{false, 88, ...|
+--------------------+

But fails on the second test_geom (Only difference is +1 on the max x coordinate)

spark.sql(f"""
SELECT grid_tessellate('{test_geom2}', 0) as tessellate
""").show()
java.lang.IllegalStateException: X coordinate (210.0) out of bounds 0-200

The failing X coordinate is half the grid cell width, i.e. the centroid of the next possible grid cell in the x-axis.

I suspect the underlying issue is at https://github.com/databrickslabs/mosaic/blob/d21302279194108de9d5cb06c294cca6f36f5768/src/main/scala/com/databricks/labs/mosaic/core/index/CustomIndexSystem.scala#L168 but I haven't been able to figure out further than that.

Expected behavior grid_tessellate() should avoid testing out of bound centroids on geometries that are near the index boundary

The obvious workaround is to add a buffer to the custom index or pre-filter geometries such that none are within a quarter of a grid cell width/length of the boundary.

Thanks for investigating

bransonf avatar Apr 04 '23 19:04 bransonf

Thank you for reporting this @bransonf, we will take a look.

edurdevic avatar Apr 12 '23 13:04 edurdevic