mosaic icon indicating copy to clipboard operation
mosaic copied to clipboard

grid_longlatascellid/grid_pointascellid returns h3 hex id that don't intersect with input lon/lat

Open damianrakowski-tomtom opened this issue 3 years ago • 3 comments

Describe the bug

grid_longlatascellid/grid_pointascellid returns h3 hex id that don't intersect with input point.

To Reproduce

import org.apache.spark.sql.functions.{col} import com.databricks.labs.mosaic.functions.MosaicContext import com.databricks.labs.mosaic.H3 import com.databricks.labs.mosaic.ESRI import org.apache.spark.sql.functions._ import org.apache.spark.sql.{DataFrame, Dataset, SparkSession, Column, Row}

val mosaicContext = MosaicContext.build(H3, ESRI) mosaicContext.register(spark) import mosaicContext.functions._

val data = Seq((-120.65246800000001, 40.420067)) val rdd = spark.sparkContext.parallelize(data) rdd.toDF("lon", "lat") .withColumn("point", st_point(col("lon"), col("lat"))) .withColumn("hexPointId", grid_pointascellid(col("point"), 8)) .withColumn("hexId", grid_longlatascellid(col("lon"), col("lat"), 8)) .withColumn("hexGeom", st_geomfromwkb(grid_boundaryaswkb(col("hexId")))) .withColumn("hexwkt", st_aswkt(col("hexGeom"))) .withColumn("intersects", st_intersects(col("point"), col("hexGeom"))) .withColumn("distance", st_distance(col("point"), st_geomfromwkb(col("hexGeom")))) .drop("lon", "lat", "hexGeom") .show(1, false)

Expected behavior Boundary of hex for a given index returned by grid_longlatascellid/grid_pointascellid should intersect with input coordinates.

Screenshots image

Additional context Databricks 10.4 LTS (Scala 2.12, Spark version 3.2.1) mosaic_0_3_0_jar_with_dependencies.jar

Point is near hex edge.

damianrakowski-tomtom avatar Oct 14 '22 13:10 damianrakowski-tomtom

Thanks for reporting this. We are looking into it.

edurdevic avatar Oct 17 '22 15:10 edurdevic

Thanks Damian for reporting this. It actually is an interesting issue. We identified the issue, and we also think we have a work around for it. We need some time to bake the workaround in Mosaic and make sure this does not cause any issue with the downstream analysis.

Let me try to explain what is happening here.

The H3 hexagons are built on top of the icosahedron faces, and then the hex vertices are projected on the Earth as a spherical icosahedron using a gnomonic projection. Please note that only the vertices are projected, while the lines between them are only an approximated representation. The gnomonic projection is not conformal, which means that the shape of objects is not perfectly preserved when projected from the icosahedron faces onto the sphere.

So it is possible (and easy) to accurately map a point from the spherical coordinates to the icosahedron and vice-versa. But when projecting a line, you would need to represent the line with an infinite number of points in order to map them precisely from one domain to the other, because the line in one domain is represented by a curve in the other.

When you use grid_latlonascellid, you are taking a point on the sphere, projecting it on the icosahedron, converting it to hex_id, and returning it. So if this point is very close to the hexagon edge, it can happen that the distortion of the hexagon at the sphere level makes it such that the point does not intersect the hexagon geometry on the sphere coordinates.

There is no way to get around the projection issues, but there is a relatively simple way to identify this situation so that this hexagon approximation does not impact any result of a grid_tessellate operation on border chip analysis. We will get back with a code example on this.

edurdevic avatar Oct 31 '22 09:10 edurdevic

This problem occurs when using the following approach in point-in-polygon joins

  1. tessellating a polygon
  2. joining with a set of indexed points to the tessellated polygons by cell ID
  3. applying st_contains between point and the chip geometry

In a small percentage of cases, the step #3 can incorrectly classify the point as "outside" the chip, while it should be "inside". The work around that consists in doing a few extra checks for the points that match the index ID but do not intersect the chip. This is the condition to re-classify them as "internal"

  • The point falls out of the index_cell_boundary (hexGeom in your example)
  • The distance point-chip is equal to the distance point-index_cell_boundary

When those two conditions are true, we can assume that the point was incorrectly classified.

edurdevic avatar Feb 17 '23 22:02 edurdevic