geomesa
geomesa copied to clipboard
[GEOMESA-3132] Integrated Apache Sedona with GeoMesa Spark SQL
Signed-off-by: Kristin Cowalcijk [email protected]
@Kontinuation this is awesome! Thank you!
In terms of some bookkeeping, since Sedona is a new dependency (and may pull in others), Eclipse requires us to check on the IP of dependencies.
For GeoMesa, we do that via a script that we run to build up the entire list of dependencies. It is here build/calculate-cqs.sh
and it modifies / rebuilds builds/cqs.tsv
. Can you run that script and commit the changes to this PR? (This will allow @elahrvivaz and I to run down any info about the new dependencies!)
In terms of some bookkeeping, since Sedona is a new dependency (and may pull in others), Eclipse requires us to check on the IP of dependencies.
For GeoMesa, we do that via a script that we run to build up the entire list of dependencies. It is here
build/calculate-cqs.sh
and it modifies / rebuildsbuilds/cqs.tsv
. Can you run that script and commit the changes to this PR? (This will allow @elahrvivaz and I to run down any info about the new dependencies!)
Newly generated builds/cqs.tsv
was commited. I've skimmed through this file and found four dependency changes:
org.apache.accumulo:accumulo-hadoop-mapreduce 2.0.0 provided
org.apache.sedona:sedona-core-2.4_2.11 1.0.1-incubating provided
org.apache.sedona:sedona-sql-2.4_2.11 1.0.1-incubating provided
org.wololo:jts2geojson 0.16.1 test
sorry we've let this hang out here so long!
@elahrvivaz Did we get the CQ sorted?
@Kontinuation as an update, I haven't forgotten about this. I may be in a better position to test things out in another week or so. (Rather than just merging the PR as is without testing.)
@elahrvivaz Did we get the CQ sorted?
No, do we need to open a CQ? there wasn't anything in clearly defined last I looked, but it's an apache project.
FYI: - see my latest post in https://gitter.im/apache/sedona:
I do not know if and how well you support Spark 2.x on scala 2.11 - but Sedona (and thus also Geomesa using this PR) might run into the following problem:
The Apache Sedona project claims 2.4 spark (legacy https://sedona.apache.org/setup/platform/) support which is built using Scala 2.11. However, https://mvnrepository.com/artifact/org.apache.sedona/sedona-python-adapter-3.0 the main sedona-python-adapter odes not seem to be published for 2.11 or Spark 2.x.
Where/how can I obtain a 2.11 & spark 2.4.x compatible build of this project?
FYI: - see my latest post in https://gitter.im/apache/sedona:
I do not know if and how well you support Spark 2.x on scala 2.11 - but Sedona (and thus also Geomesa using this PR) might run into the following problem:
The Apache Sedona project claims 2.4 spark (legacy https://sedona.apache.org/setup/platform/) support which is built using Scala 2.11. However, https://mvnrepository.com/artifact/org.apache.sedona/sedona-python-adapter-3.0 the main sedona-python-adapter odes not seem to be published for 2.11 or Spark 2.x.
Where/how can I obtain a 2.11 & spark 2.4.x compatible build of this project?
Though: mvn clean install -DskipTests -Dscala=2.11 -Dspark=2.4
seems to be possible (but simply not published)
FYI: - see my latest post in https://gitter.im/apache/sedona: I do not know if and how well you support Spark 2.x on scala 2.11 - but Sedona (and thus also Geomesa using this PR) might run into the following problem: The Apache Sedona project claims 2.4 spark (legacy https://sedona.apache.org/setup/platform/) support which is built using Scala 2.11. However, https://mvnrepository.com/artifact/org.apache.sedona/sedona-python-adapter-3.0 the main sedona-python-adapter odes not seem to be published for 2.11 or Spark 2.x. Where/how can I obtain a 2.11 & spark 2.4.x compatible build of this project?
Though:
mvn clean install -DskipTests -Dscala=2.11 -Dspark=2.4
seems to be possible (but simply not published)
https://mvnrepository.com/artifact/org.apache.sedona/sedona-python-adapter-2.4 seems to be already published
sorry I haven't gotten to this yet, hoping to do so soon!
I got the imcompatible problem of org.locationtech.jts
when using BroadcastIndexJoin
from sedona with error:
Caused by: java.lang.NoSuchMethodError: org.locationtech.jts.index.quadtree.Quadtree.getRoot()Lorg/locationtech/jts/index/quadtree/Root;
at org.locationtech.jts.index.quadtree.IndexSerde.write(IndexSerde.java:61)
at org.apache.sedona.core.geometryObjects.SpatialIndexSerde.write(SpatialIndexSerde.java:66)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
Currently sedona uses JTS 1.18 and geomesa uses 1.17,anyone knows how to make this work?
I got the imcompatible problem of
org.locationtech.jts
when usingBroadcastIndexJoin
from sedona with error:Caused by: java.lang.NoSuchMethodError: org.locationtech.jts.index.quadtree.Quadtree.getRoot()Lorg/locationtech/jts/index/quadtree/Root; at org.locationtech.jts.index.quadtree.IndexSerde.write(IndexSerde.java:61) at org.apache.sedona.core.geometryObjects.SpatialIndexSerde.write(SpatialIndexSerde.java:66) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
Currently sedona uses JTS 1.18 and geomesa uses 1.17,anyone knows how to make this work?
I reproduced this problem on PySpark and found a workaround: we can place sedona jar prior to the geomesa jar when specifying --jars
command line parameter, so that JTS 1.18 will take precedence over JTS 1.17:
spark-submit --jars sedona-python-adapter-3.0_2.12-1.1.1-incubating.jar,geomesa-hbase-spark-runtime-hbase2_2.12-3.5.0-SNAPSHOT.jar [other parameters]
If you are using Java or Scala, a similar approach can be applied I believe.
i've merged in main, fixed conflicts, and bumped the sedona version to 1.3.1 (and fixed a few resulting errors) here: https://github.com/locationtech/geomesa/pull/2948
merged as https://github.com/locationtech/geomesa/commit/3d4d9749f7344a0357c6fef2f20e3839ed4deb51 sorry for the long wait, thanks for the patience!