magellan icon indicating copy to clipboard operation
magellan copied to clipboard

Broadcast Timeout on Ontario GeoJson boundary

Open bdgeise opened this issue 7 years ago • 0 comments

Using the attached geojson boundary for Ontario there was an error for a broadcast timeout when trying to run against about ~1b points. The points data set is parquet.

Final app status: FAILED, exitCode: 15, (reason: User class threw exception: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange SinglePartition

Also this seemed to result in a broadcast time out on Spark and increasing spark.sql.autoBroadcastJoinThreshold and spark.sql.broadcastTimeout did not help.

We did notice through a conversion process to create the GeoJson structure that the precision is very high >15.

ontario_ca.geojson.zip

bdgeise avatar Jun 20 '18 15:06 bdgeise