snappydata
snappydata copied to clipboard
snappy data on emr 5.9 spark 2.2 and emr spark 2.1 doesnt work has anyone tried snappy on EMR please assist
following spark quickstart quide
./bin/spark-shell --conf spark.snappydata.store.sys-disk-dir=quickstartdatadir --conf spark.snappydata.store.log-file=quickstartdatadir/quickstart.log --packages "SnappyDataInc:snappydata:1.0.0-s_2.11"
val snappy = new org.apache.spark.sql.SnappySession(spark.sparkContext)
NanoTimer::Problem loading library from URL path: /home/hadoop/.ivy2/jars/libgemfire64.so: java.lang.UnsatisfiedLinkError: no gemfire64 in java.library.path
java.lang.NoSuchMethodError: org.apache.spark.sql.internal.SessionState.
@rsachar06 Which version of Spark are you running with? The latest 1.0.0 release is supported with upstream Spark 2.1.1 or SnappyData's Spark itself (i.e. running ./bin/spark-shell from SnappyData distribution).
@rsachar06 Checked the available EMR versions. The one supporting Spark 2.1.1 is EMR 5.6.0 which should work with SnappyData (https://aws.amazon.com/about-aws/whats-new/2017/06/updates-to-apache-spark-and-in-transit-encryption-for-presto-in-amazon-emr-release-5-6-0/). Can you try with that version and see?
Unfortunately Spark minor versions frequently change APIs used by SnappyData (e.g. SessionState, Catalog etc) so its not possible to have a single build work against multiple releases. We can make a 2.1.0 compatible build available if required. We have been looking at alternatives for smart connector mode in the coming releases to support multiple releases.
@rsachar06 The steps you tried would work fine on a local machine or laptop with spark 2.1.1 distribution. It would be what we call Local mode.
These failed on EMR because EMR uses YARN as the cluster resource manager and hence you need to do couple of more steps (this is Smart Connector mode):
- you'll need to start snappydata cluster which your spark cluster (EMR) can access and
- you'll need to provide the snappydata cluster info to your spark-shell on EMR as "--conf spark.snappydata.connection=locatorHostname:1527"
./bin/spark-shell --conf spark.snappydata.store.sys-disk-dir=quickstartdatadir --conf spark.snappydata.store.log-file=quickstartdatadir/quickstart.log --conf spark.snappydata.connection=locatorHostname:1527 --packages "SnappyDataInc:snappydata:1.0.0-s_2.11"
For step 1), you can do that quickly on AWS via CloudFormation: http://www.snappydata.io/cloudbuilder . When the cluster is running (check the Events or Outputs tab of your AWS CloudFormation Stack), replace 'locatorHostname' in step 2) with the ec2 hostname.
@rsachar06 It looks like you submitted the cloudbuilder form a few times; were you able to get it to work?