snappydata icon indicating copy to clipboard operation
snappydata copied to clipboard

snappy data on emr 5.9 spark 2.2 and emr spark 2.1 doesnt work has anyone tried snappy on EMR please assist

Open rsachar06 opened this issue 7 years ago • 4 comments

following spark quickstart quide

./bin/spark-shell --conf spark.snappydata.store.sys-disk-dir=quickstartdatadir --conf spark.snappydata.store.log-file=quickstartdatadir/quickstart.log --packages "SnappyDataInc:snappydata:1.0.0-s_2.11"

val snappy = new org.apache.spark.sql.SnappySession(spark.sparkContext) NanoTimer::Problem loading library from URL path: /home/hadoop/.ivy2/jars/libgemfire64.so: java.lang.UnsatisfiedLinkError: no gemfire64 in java.library.path java.lang.NoSuchMethodError: org.apache.spark.sql.internal.SessionState.(Lorg/apache/spark/sql/SparkSession;)V at org.apache.spark.sql.internal.SnappySessionState.(SnappySessionState.scala:57) at org.apache.spark.sql.SnappySession.sessionState$lzycompute(SnappySession.scala:117) at org.apache.spark.sql.SnappySession.sessionState(SnappySession.scala:113) at org.apache.spark.sql.SnappySession.(SnappySession.scala:130) at org.apache.spark.sql.SnappySession.(SnappySession.scala:83) ... 48 elided

rsachar06 avatar Oct 30 '17 23:10 rsachar06

@rsachar06 Which version of Spark are you running with? The latest 1.0.0 release is supported with upstream Spark 2.1.1 or SnappyData's Spark itself (i.e. running ./bin/spark-shell from SnappyData distribution).

sumwale avatar Oct 31 '17 15:10 sumwale

@rsachar06 Checked the available EMR versions. The one supporting Spark 2.1.1 is EMR 5.6.0 which should work with SnappyData (https://aws.amazon.com/about-aws/whats-new/2017/06/updates-to-apache-spark-and-in-transit-encryption-for-presto-in-amazon-emr-release-5-6-0/). Can you try with that version and see?

Unfortunately Spark minor versions frequently change APIs used by SnappyData (e.g. SessionState, Catalog etc) so its not possible to have a single build work against multiple releases. We can make a 2.1.0 compatible build available if required. We have been looking at alternatives for smart connector mode in the coming releases to support multiple releases.

sumwale avatar Nov 01 '17 05:11 sumwale

@rsachar06 The steps you tried would work fine on a local machine or laptop with spark 2.1.1 distribution. It would be what we call Local mode.

These failed on EMR because EMR uses YARN as the cluster resource manager and hence you need to do couple of more steps (this is Smart Connector mode):

  1. you'll need to start snappydata cluster which your spark cluster (EMR) can access and
  2. you'll need to provide the snappydata cluster info to your spark-shell on EMR as "--conf spark.snappydata.connection=locatorHostname:1527"

./bin/spark-shell --conf spark.snappydata.store.sys-disk-dir=quickstartdatadir --conf spark.snappydata.store.log-file=quickstartdatadir/quickstart.log --conf spark.snappydata.connection=locatorHostname:1527 --packages "SnappyDataInc:snappydata:1.0.0-s_2.11"

For step 1), you can do that quickly on AWS via CloudFormation: http://www.snappydata.io/cloudbuilder . When the cluster is running (check the Events or Outputs tab of your AWS CloudFormation Stack), replace 'locatorHostname' in step 2) with the ec2 hostname.

ashetkar avatar Nov 09 '17 13:11 ashetkar

@rsachar06 It looks like you submitted the cloudbuilder form a few times; were you able to get it to work?

piercelamb avatar Nov 09 '17 16:11 piercelamb