sedona
sedona copied to clipboard
ST_Difference function crashes while working on some geometries
Behavior
I'm processing some of the OSM geometries, and encountered an exception for both ST_Difference and ST_SymDifference function.
Steps to reproduce the problem
run this query:
spark.sql(
"""
SELECT
ST_Difference(
ST_GeomFromText('MULTIPOLYGON(((-71.0978928 42.3462476,-71.097854 42.34660099999999,-71.0977237 42.34672280000001,-71.0977251 42.34703010000001,-71.0970225 42.34703010000001,-71.0966997 42.346889799999985,-71.0964544 42.3464166,-71.0964974 42.3463562,-71.0966927 42.346306,-71.0969533 42.34629069999997,-71.0973551 42.346184100000016,-71.0977681 42.34614830000001,-71.0978041 42.34615439999999,-71.0978373 42.3461657,-71.0978595 42.34617900000001,-71.0978775 42.34619950000001,-71.0978858 42.3462231,-71.0978928 42.3462476),(-71.0977082 42.3462332,-71.0974766 42.34623529999999,-71.0974664 42.34622680000001,-71.0974562 42.3462193,-71.0974452 42.34621599999997,-71.0974288 42.3462136,-71.0974068 42.346215400000006,-71.0973766 42.34621959999998,-71.0973448 42.34622680000001,-71.0973269 42.34623500000001,-71.0972661 42.3462356,-71.0972661 42.346246699999995,-71.0972995 42.346247300000016,-71.0972881 42.3462639,-71.0972816 42.346278100000006,-71.0972726 42.34629860000004,-71.0972661 42.34632870000004,-71.0972628 42.34635320000001,-71.0972624 42.3463797,-71.0972657 42.346407699999986,-71.0972787 42.34644,-71.0972896 42.3464596,-71.0973031 42.346484000000004,-71.0973218 42.34650779999998,-71.0973426 42.346527400000014,-71.0973644 42.34654550000002,-71.0973946 42.34656509999999,-71.0974268 42.346582299999994,-71.0974619 42.34659640000001,-71.097497 42.34660820000002,-71.0975349 42.346615999999955,-71.0975623 42.3466181,-71.0976006 42.34661959999997,-71.097641 42.346619000000004,-71.0976614 42.3466181,-71.0976981 42.3466124,-71.0977307 42.3466052,-71.0977621 42.346595500000035,-71.0977707 42.346592799999996,-71.0977735 42.34661840000001,-71.0977899 42.3466181,-71.0977878 42.34656899999999,-71.0978009 42.34655360000002,-71.0978074 42.34653829999999,-71.0978139 42.34651109999996,-71.0978151 42.3464912,-71.0978033 42.3464692,-71.0977878 42.34645810000001,-71.0977833 42.346288199999975,-71.0977968 42.34628480000001,-71.0978094 42.346279100000004,-71.0978213 42.34626920000002,-71.0978298 42.34625320000001,-71.0978311 42.34623989999997,-71.097827 42.346225799999985,-71.0978192 42.34621580000001,-71.097807 42.3462059,-71.0977911 42.34619950000001,-71.0977748 42.34619620000001,-71.0977564 42.3461968,-71.0977417 42.34620010000003,-71.0977283 42.34620619999998,-71.0977197 42.34621280000002,-71.0977111 42.346221499999984,-71.0977082 42.3462332),(-71.0978448 42.346496900000005,-71.0978422 42.34644080000001,-71.0978257 42.34644119999999,-71.0978283 42.34649730000001,-71.0978448 42.346496900000005),(-71.0975012 42.3462035,-71.0975011 42.34619219999999,-71.0974257 42.34619270000002,-71.0974259 42.34620399999997,-71.0975012 42.3462035)),((-71.0956691 42.346659200000005,-71.095776 42.34663069999999,-71.0957691 42.3466133,-71.095754 42.3465789,-71.0957458 42.34656380000001,-71.0956338 42.3465922,-71.0954915 42.34633199999999,-71.0956146 42.34621469999999,-71.095645 42.3461858,-71.0958218 42.346544799999975,-71.0958794 42.3465291,-71.0959138 42.34651969999999,-71.0958505 42.34639350000003,-71.0958936 42.34632269999997,-71.0959829 42.34617589999996,-71.096042 42.34607869999999,-71.098156 42.3455084,-71.0981646 42.345526500000034,-71.0981776 42.34555209999999,-71.0982018 42.3455979,-71.0982132 42.34562059999999,-71.0982492 42.345692299999996,-71.0982652 42.345724099999984,-71.0985685 42.346323499999954,-71.0988082 42.34680320000001,-71.0987779 42.346853399999986,-71.0985196 42.34709509999999,-71.0983085 42.3470973,-71.0980514 42.34709960000001,-71.0980515 42.3471198,-71.0978489 42.34712179999997,-71.0978267 42.34712059999998,-71.097826 42.34714120000001,-71.0978255 42.3471557,-71.0977857 42.347155799999996,-71.0968676 42.34715829999999,-71.096866 42.3471251,-71.0967996 42.34712490000001,-71.0964846 42.347123899999985,-71.0964634 42.34712339999999,-71.0964618 42.34710089999999,-71.0960882 42.3470978,-71.0958558 42.34661220000001,-71.0956691 42.346659200000005),(-71.0965049 42.346867299999985,-71.0964854 42.34689119999999,-71.0969005 42.3470643,-71.0969397 42.347100100000006,-71.0977894 42.347098500000016,-71.0977862 42.34674050000001,-71.0979186 42.346618500000005,-71.0979231 42.34657709999999,-71.0979389 42.346431800000005,-71.0979593 42.346230899999995,-71.0979322 42.34617269999998,-71.0978613 42.346124300000014,-71.0977835 42.3461083,-71.0975181 42.34612390000001,-71.0973236 42.34614139999999,-71.0971148 42.34620189999998,-71.0971038 42.34619079999999,-71.09691 42.34624529999999,-71.0966576 42.346251300000006,-71.096472 42.34630279999999,-71.0964351 42.34631490000004,-71.0964176 42.3463247,-71.0964021 42.346337199999965,-71.0963898 42.346351999999996,-71.0963813 42.34636710000004,-71.0963767 42.346381399999984,-71.0963751 42.34640559999997,-71.0963833 42.34643109999999,-71.0963165 42.34645019999999,-71.0963025 42.346454100000045,-71.096334 42.346508099999994,-71.0965049 42.346867299999985),(-71.0978244 42.345882399999994,-71.0978319 42.34582399999999,-71.0978439 42.3457607,-71.0971974 42.3458153,-71.0971806 42.34578429999999,-71.0968599 42.34587120000003,-71.0968806 42.345911,-71.0969008 42.34594959999998,-71.0969132 42.34597339999999,-71.0970628 42.3459335,-71.0972002 42.34591080000001,-71.0976577 42.3458746,-71.0978244 42.345882399999994),(-71.0985254 42.3462691,-71.0984631 42.346283400000004,-71.0984151 42.34665789999997,-71.0985285 42.34666390000004,-71.0987011 42.346617099999975,-71.0985254 42.3462691)),((-71.0964031 42.3464687,-71.0964111 42.346484000000004,-71.0964611 42.34658490000004,-71.0963922 42.3466037,-71.0963346 42.3464869,-71.0964031 42.3464687)),((-71.0957869 42.3461475,-71.095645 42.3461858,-71.0956805 42.3461762,-71.0957869 42.3461475)))'),
ST_GeomFromText('MULTIPOLYGON(((-71.0958218 42.346544799999975,-71.095645 42.3461858,-71.0956805 42.3461762,-71.096042 42.34607869999999,-71.0959829 42.34617589999996,-71.0958936 42.34632269999997,-71.0958505 42.34639350000003,-71.0959138 42.34651969999999,-71.0958794 42.3465291,-71.0958218 42.346544799999975)),((-71.0958218 42.346544799999975,-71.0957458 42.34656380000001,-71.0956338 42.3465922,-71.0954915 42.34633199999999,-71.0956146 42.34621469999999,-71.095645 42.3461858,-71.0958218 42.346544799999975)))')
) AS geom
"""
).show(truncate = False)
Tacktrace:
File "<stdin>", line 1, in <module>
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 423, in show
print(self._jdf.showString(n, int_truncate, vertical))
File "/opt/amazon/spark/python/lib/py4j-0.10.9.3-src.zip/py4j/java_gateway.py", line 1321, in __call__
return_value = get_return_value(
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
return f(*a, **kw)
File "/opt/amazon/spark/python/lib/py4j-0.10.9.3-src.zip/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o55.showString.
: org.locationtech.jts.geom.TopologyException: Directed Edge visited twice during ring-building at (-71.0958218, 42.346544799999975, NaN)
at org.locationtech.jts.geomgraph.EdgeRing.computePoints(EdgeRing.java:126)
at org.locationtech.jts.geomgraph.EdgeRing.<init>(EdgeRing.java:51)
at org.locationtech.jts.operation.overlay.MaximalEdgeRing.<init>(MaximalEdgeRing.java:46)
at org.locationtech.jts.operation.overlay.PolygonBuilder.buildMaximalEdgeRings(PolygonBuilder.java:93)
at org.locationtech.jts.operation.overlay.PolygonBuilder.add(PolygonBuilder.java:67)
at org.locationtech.jts.operation.overlay.PolygonBuilder.add(PolygonBuilder.java:56)
at org.locationtech.jts.operation.overlay.OverlayOp.computeOverlay(OverlayOp.java:248)
at org.locationtech.jts.operation.overlay.OverlayOp.getResultGeometry(OverlayOp.java:181)
at org.locationtech.jts.operation.overlay.OverlayOp.overlayOp(OverlayOp.java:84)
at org.locationtech.jts.operation.overlay.snap.SnapIfNeededOverlayOp.getResultGeometry(SnapIfNeededOverlayOp.java:75)
at org.locationtech.jts.operation.overlay.snap.SnapIfNeededOverlayOp.overlayOp(SnapIfNeededOverlayOp.java:37)
at org.locationtech.jts.geom.GeometryOverlay.overlay(GeometryOverlay.java:76)
at org.locationtech.jts.geom.GeometryOverlay.difference(GeometryOverlay.java:89)
at org.locationtech.jts.geom.Geometry.difference(Geometry.java:1391)
at org.apache.sedona.common.Functions.difference(Functions.java:551)
at org.apache.spark.sql.sedona_sql.expressions.ST_Difference$$anonfun$$lessinit$greater$60.apply(Functions.scala:727)
at org.apache.spark.sql.sedona_sql.expressions.ST_Difference$$anonfun$$lessinit$greater$60.apply(Functions.scala:727)
at org.apache.spark.sql.sedona_sql.expressions.InferrableFunctionConverter$.$anonfun$inferrableFunction2$2(InferrableFunctionConverter.scala:53)
at org.apache.spark.sql.sedona_sql.expressions.InferredExpression.eval(InferredExpression.scala:70)
at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:483)
at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:69)
at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:57)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:519)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:83)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:519)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:524)
at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1150)
at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1149)
at org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:473)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:524)
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsDownWithPruning$1(QueryPlan.scala:152)
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:83)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193)
at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204)
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:209)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.immutable.List.foreach(List.scala:431)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.immutable.List.map(List.scala:305)
at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:209)
at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:362)
at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDownWithPruning(QueryPlan.scala:152)
at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1.applyOrElse(expressions.scala:57)
at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1.applyOrElse(expressions.scala:55)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:519)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:83)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:519)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformWithPruning(TreeNode.scala:485)
at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$.apply(expressions.scala:55)
at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$.apply(expressions.scala:45)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:215)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:91)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeBatch$1(RuleExecutor.scala:212)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$6(RuleExecutor.scala:284)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$RuleExecutionContext$.withContext(RuleExecutor.scala:327)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5(RuleExecutor.scala:284)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5$adapted(RuleExecutor.scala:274)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:274)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:188)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$optimizedPlan$1(QueryExecution.scala:146)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:192)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:224)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:224)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan$lzycompute(QueryExecution.scala:142)
at org.apache.spark.sql.execution.QueryExecution.optimizedPlan(QueryExecution.scala:138)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$writePlans$4(QueryExecution.scala:306)
at org.apache.spark.sql.catalyst.plans.QueryPlan$.append(QueryPlan.scala:606)
at org.apache.spark.sql.execution.QueryExecution.writePlans(QueryExecution.scala:306)
at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:323)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:277)
at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:256)
at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:102)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3768)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2769)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2976)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:289)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:328)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
Settings
Sedona version = 1.5.0
Apache Spark version = 3.2.1
Apache Flink version = ?
API type = Python
Scala version =
JRE version =
Python version = 3.10
Environment = AWS Athena
@ruanqizhen The polygon is invalid for Difference operation because it has nested holes
You can use ST_ExteriorRing (https://sedona.apache.org/1.5.1/api/sql/Function/#st_exteriorring) to only use the exterior ring.
A similar issue has been reported on JTS repo: https://github.com/locationtech/jts/issues/303
@ruanqizhen The polygon is invalid for
Differenceoperation because it has nested holesYou can use ST_ExteriorRing (https://sedona.apache.org/1.5.1/api/sql/Function/#st_exteriorring) to only use the exterior ring.
A similar issue has been reported on JTS repo: locationtech/jts#303
Hi Jia, thank you for the information.
I need to keep the nested holes, is there a way to process "difference" operation with the nested holes?
@ruanqizhen I don't think it is possible to process the difference function with that. You have to use the exterior ring or take those nested holes out as the individual polygons
You can use ST_ExteriorRing (https://sedona.apache.org/1.5.1/api/sql/Function/#st_exteriorring) to only use the exterior ring.