snappydata
snappydata copied to clipboard
Apache Zeppelin on remote host
Hello, I'm trying configure Apache Zeppelin to work with SnappyData cluster on remote host. I don't understand one point in documentation: _In the classpath option, define the location where the SnappyData Interpreter is downloaded by adding -classpath=/<download_location>/snappydata-zeppelin-<version_number>.jar
It's clear for me, and works when I have everything on one host, but what should I put to lead conf file when Zeppelin is on remote host? Should I copy snappydata-zeppelin-<version_number>.jar to SnappyData folder and set the classpath to local file in SnappyData location?
Thank you in advanced for help :D
Yes, basically, the snappydata-zeppelin-
Hello, thank you. It works. But I would like to ask one more question. I have Spark standalone cluster and SnappyData cluster. I would like to use Spark as engine and Snappy as "database"
- in SnappyData interpreter configuration there are parameters: spark.app.name and master. But I don't see anything on Spark master console - no application no jobs. Instead of on SnappyDashborad I see everything. It looks like all actions have been executed only on SnappyData. No requests have been send to standalone Spark.
- I set exactly the same value for master as I have in Zeppelin Spark interpreter - action from this interpreter works on Spark standalone cluster
Could you help me, how should I configure Zeppelin and SnappyData to be able to use Spark and SnappyData.
For using Zeppelin in the the smart connector mode (i.e. separate SnappyData and Spark clusters), you simply configure Zeppelin to use spark interpreter with the Spark cluster as normal. Just add "spark.snappydata.connection" property to conf to point to locator thrift host:port (1527 by default) as in normal smart connector configuration. No zeppelin configuration on lead is required. The instructions given in the documentation are for running Zeppelin in embedded mode on lead so everything goes directly to the SnappyData cluster.
Note, however, this mode will give quite a bit lower performance than the embedded mode. Unless there are specific requirements for the same, it recommended to use the embedded mode for best experience. The two cases we know where it can be useful: a) existing Spark installation/app that one wants to use and doesn't want to run in embedded mode due to pre-existing infrastructure, b) expanding execution capacity of cluster if CPU in embedded mode is seen to be a bottleneck and expanding the embedded cluster itself is not feasible for some reason. The embedded mode as is can scale quite high including client connections (JDBC/ODBC/thrift) because every node can host a thrift server unlike Spark's hive-thriftserver.