geomesa icon indicating copy to clipboard operation
geomesa copied to clipboard

[GEOMESA-3132] Integrated Apache Sedona with GeoMesa Spark SQL

Open Kontinuation opened this issue 3 years ago • 12 comments

Signed-off-by: Kristin Cowalcijk [email protected]

Kontinuation avatar Sep 19 '21 17:09 Kontinuation

@Kontinuation this is awesome! Thank you!

jnh5y avatar Sep 20 '21 15:09 jnh5y

In terms of some bookkeeping, since Sedona is a new dependency (and may pull in others), Eclipse requires us to check on the IP of dependencies.

For GeoMesa, we do that via a script that we run to build up the entire list of dependencies. It is here build/calculate-cqs.sh and it modifies / rebuilds builds/cqs.tsv. Can you run that script and commit the changes to this PR? (This will allow @elahrvivaz and I to run down any info about the new dependencies!)

jnh5y avatar Sep 20 '21 15:09 jnh5y

In terms of some bookkeeping, since Sedona is a new dependency (and may pull in others), Eclipse requires us to check on the IP of dependencies.

For GeoMesa, we do that via a script that we run to build up the entire list of dependencies. It is here build/calculate-cqs.sh and it modifies / rebuilds builds/cqs.tsv. Can you run that script and commit the changes to this PR? (This will allow @elahrvivaz and I to run down any info about the new dependencies!)

Newly generated builds/cqs.tsv was commited. I've skimmed through this file and found four dependency changes:

org.apache.accumulo:accumulo-hadoop-mapreduce	2.0.0	provided
org.apache.sedona:sedona-core-2.4_2.11	1.0.1-incubating	provided
org.apache.sedona:sedona-sql-2.4_2.11	1.0.1-incubating	provided
org.wololo:jts2geojson	0.16.1	test

Kontinuation avatar Sep 22 '21 04:09 Kontinuation

sorry we've let this hang out here so long!

elahrvivaz avatar Oct 18 '21 15:10 elahrvivaz

@elahrvivaz Did we get the CQ sorted?

@Kontinuation as an update, I haven't forgotten about this. I may be in a better position to test things out in another week or so. (Rather than just merging the PR as is without testing.)

jnh5y avatar Nov 10 '21 21:11 jnh5y

@elahrvivaz Did we get the CQ sorted?

No, do we need to open a CQ? there wasn't anything in clearly defined last I looked, but it's an apache project.

elahrvivaz avatar Nov 10 '21 21:11 elahrvivaz

FYI: - see my latest post in https://gitter.im/apache/sedona:

I do not know if and how well you support Spark 2.x on scala 2.11 - but Sedona (and thus also Geomesa using this PR) might run into the following problem:

The Apache Sedona project claims 2.4 spark (legacy https://sedona.apache.org/setup/platform/) support which is built using Scala 2.11. However, https://mvnrepository.com/artifact/org.apache.sedona/sedona-python-adapter-3.0 the main sedona-python-adapter odes not seem to be published for 2.11 or Spark 2.x.

Where/how can I obtain a 2.11 & spark 2.4.x compatible build of this project?

geoHeil avatar Feb 16 '22 13:02 geoHeil

FYI: - see my latest post in https://gitter.im/apache/sedona:

I do not know if and how well you support Spark 2.x on scala 2.11 - but Sedona (and thus also Geomesa using this PR) might run into the following problem:

The Apache Sedona project claims 2.4 spark (legacy https://sedona.apache.org/setup/platform/) support which is built using Scala 2.11. However, https://mvnrepository.com/artifact/org.apache.sedona/sedona-python-adapter-3.0 the main sedona-python-adapter odes not seem to be published for 2.11 or Spark 2.x.

Where/how can I obtain a 2.11 & spark 2.4.x compatible build of this project?

Though: mvn clean install -DskipTests -Dscala=2.11 -Dspark=2.4 seems to be possible (but simply not published)

geoHeil avatar Feb 16 '22 14:02 geoHeil

FYI: - see my latest post in https://gitter.im/apache/sedona: I do not know if and how well you support Spark 2.x on scala 2.11 - but Sedona (and thus also Geomesa using this PR) might run into the following problem: The Apache Sedona project claims 2.4 spark (legacy https://sedona.apache.org/setup/platform/) support which is built using Scala 2.11. However, https://mvnrepository.com/artifact/org.apache.sedona/sedona-python-adapter-3.0 the main sedona-python-adapter odes not seem to be published for 2.11 or Spark 2.x. Where/how can I obtain a 2.11 & spark 2.4.x compatible build of this project?

Though: mvn clean install -DskipTests -Dscala=2.11 -Dspark=2.4 seems to be possible (but simply not published)

https://mvnrepository.com/artifact/org.apache.sedona/sedona-python-adapter-2.4 seems to be already published

geoHeil avatar Feb 17 '22 08:02 geoHeil

sorry I haven't gotten to this yet, hoping to do so soon!

elahrvivaz avatar Mar 30 '22 13:03 elahrvivaz

I got the imcompatible problem of org.locationtech.jts when using BroadcastIndexJoin from sedona with error:

Caused by: java.lang.NoSuchMethodError: org.locationtech.jts.index.quadtree.Quadtree.getRoot()Lorg/locationtech/jts/index/quadtree/Root;
	at org.locationtech.jts.index.quadtree.IndexSerde.write(IndexSerde.java:61)
	at org.apache.sedona.core.geometryObjects.SpatialIndexSerde.write(SpatialIndexSerde.java:66)
	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)

Currently sedona uses JTS 1.18 and geomesa uses 1.17,anyone knows how to make this work?

tosen1990 avatar Apr 19 '22 06:04 tosen1990

I got the imcompatible problem of org.locationtech.jts when using BroadcastIndexJoin from sedona with error:

Caused by: java.lang.NoSuchMethodError: org.locationtech.jts.index.quadtree.Quadtree.getRoot()Lorg/locationtech/jts/index/quadtree/Root;
	at org.locationtech.jts.index.quadtree.IndexSerde.write(IndexSerde.java:61)
	at org.apache.sedona.core.geometryObjects.SpatialIndexSerde.write(SpatialIndexSerde.java:66)
	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)

Currently sedona uses JTS 1.18 and geomesa uses 1.17,anyone knows how to make this work?

I reproduced this problem on PySpark and found a workaround: we can place sedona jar prior to the geomesa jar when specifying --jars command line parameter, so that JTS 1.18 will take precedence over JTS 1.17:

spark-submit --jars sedona-python-adapter-3.0_2.12-1.1.1-incubating.jar,geomesa-hbase-spark-runtime-hbase2_2.12-3.5.0-SNAPSHOT.jar [other parameters]

If you are using Java or Scala, a similar approach can be applied I believe.

Kontinuation avatar Apr 23 '22 16:04 Kontinuation

i've merged in main, fixed conflicts, and bumped the sedona version to 1.3.1 (and fixed a few resulting errors) here: https://github.com/locationtech/geomesa/pull/2948

elahrvivaz avatar Jan 20 '23 21:01 elahrvivaz

merged as https://github.com/locationtech/geomesa/commit/3d4d9749f7344a0357c6fef2f20e3839ed4deb51 sorry for the long wait, thanks for the patience!

elahrvivaz avatar Jan 24 '23 00:01 elahrvivaz