mosaic
mosaic copied to clipboard
grid_tessellate fails with out of bound error if supplied geometry is < quarter of grid cell size near index edge
The grid_tessellate
function produces an error java.lang.IllegalStateException: X coordinate (210.0) out of bounds 0-200
originating from CustomIndexSystem.pointToIndex()
when the supplied geometry is within a quarter of the configured cell size. (cell_size/4)
To Reproduce Assuming an environment initialized as: (Note: JTS/ESRI geometry API produce same result)
%python
xmin = 0
xmax = 200
ymin = 0
ymax = 200
grid_size = 20
splits = 2
custom_index_def = f"CUSTOM({xmin},{xmax},{ymin},{ymax},{splits},{grid_size},{grid_size})"
spark.conf.set("spark.databricks.labs.mosaic.index.system", custom_index_def)
import mosaic
mosaic.enable_mosaic(spark, dbutils)
boundary_geom = "POLYGON ((200 0, 200 200, 0 200, 0 0, 200 0))"
test_geom = "POLYGON ((195 180, 195 195, 180 195, 180 180, 195 180))"
test_geom2 = "POLYGON ((196 180, 196 195, 180 195, 180 180, 196 180))"
Both geoms are obviously within the index:
%python
spark.sql(f"""
SELECT st_contains(
'{boundary_geom}',
geoms
) as within_index
FROM VALUES
('{test_geom}'),
('{test_geom2}') AS data(geoms)
""").show()
+------------+
|within_index|
+------------+
| true|
| true|
+------------+
grid_polyfill()
produces the correct result
%python
spark.sql(f"""
SELECT grid_polyfill(
geoms, 0
) as polyfill
FROM VALUES
('{test_geom}'),
('{test_geom2}') AS data(geoms)
""").show()
+--------+
|polyfill|
+--------+
| [99]|
| [99]|
+--------+
grid_tessellate()
works for the first test_geom
spark.sql(f"""
SELECT grid_tessellate('{test_geom}', 0) as tessellate
""").show()
+--------------------+
| tessellate|
+--------------------+
|{[{false, 88, ...|
+--------------------+
But fails on the second test_geom (Only difference is +1 on the max x coordinate)
spark.sql(f"""
SELECT grid_tessellate('{test_geom2}', 0) as tessellate
""").show()
java.lang.IllegalStateException: X coordinate (210.0) out of bounds 0-200
The failing X coordinate is half the grid cell width, i.e. the centroid of the next possible grid cell in the x-axis.
I suspect the underlying issue is at https://github.com/databrickslabs/mosaic/blob/d21302279194108de9d5cb06c294cca6f36f5768/src/main/scala/com/databricks/labs/mosaic/core/index/CustomIndexSystem.scala#L168 but I haven't been able to figure out further than that.
Expected behavior
grid_tessellate() should avoid testing out of bound centroids on geometries that are near the index boundary
The obvious workaround is to add a buffer to the custom index or pre-filter geometries such that none are within a quarter of a grid cell width/length of the boundary.
Thanks for investigating
Thank you for reporting this @bransonf, we will take a look.