snappydata Configuration of SnappyCluster with standalone Spark cluster

Hello, I'm stuck with SnappyData cluster configuration I have 2 hosts with Spark cluster

host 1 : master + slave
host2 : slave

I want to run on those hosts SnappyData cluster

host 1 : leads + locator + data_server
host 2 : locator standby + data_server

I need to use SnappyData as database for my Spark cluster I created <snappy_home_folder>/conf/spark-env.sh file and I set SPARK_MASTER_HOST and SPARK_MASTER_PORT for my current spark standalone cluster.

I configured locator file for SnappyData, I'm trying set content for locators, but I don't understand the idea probably:

Do I have to set spark.executor.cores= and -spark.ui.port= . ? If yes - those value are for standalone Spark or Snappy embedded Spark?
If those value are for standalone Spark cluster value for -spark.ui.port= for each host: 1 or 2 ? I have two workers.

Thank you in advanced. Regards, Tom

Oct 26 '17 09:10 tommir1

@tommir1 For SnappyData configuration you can follow http://snappydatainc.github.io/snappydata/configuring_cluster/configuring_cluster/ . There you can see different parameters are set on corresponding property files e.g. conf/locators, conf/leads & conf/servers. For Spark you can follow the steps as you have mentioned. -spark-ui.port is for Spark driver or Snappy leader node only where a dashboard is launched. This dashboard shows aggregated values from different worker nodes. Hope I clarified your doubt.

Oct 26 '17 11:10 rishitesh

Thank you for your answer. Correct me please, if I'm wrong:

I have to set spark.executor.cores= even if I don't want to use embedded Spark? When I'm using SnappyData as database, embedded Spark is up and running ? But when I have proper config in <snappy_home_folder>/conf/spark-env.sh (for standalone spark cluster) and I set -spark-ui.port - I have dashbord with which Spark ? I'm asking because I'm trying also user /bin/spark-submit with snappy, and I don't see any action on my Spark standalone master console, but only in dasboard. I'm little confiused -> why I don'y see application in Spark standalone console ?

Thank you a lot again for your help.

Oct 26 '17 11:10 tommir1

@tommir1

I have to set spark.executor.cores= even if I don't want to use embedded Spark?

When you say embedded Spark do you mean SnappyData engine. I did not quite get your question. But if you are not using SnappyData then setting of executor cores etc. should be as per Apache Spark guidelines. In most of the cases you can leave it as default.

When I'm using SnappyData as database, embedded Spark is up and running ?

Yes you need to start the SnappyData cluster. The cluster has a lead node which is equivalent to driver process of spark and the servers play the role of executors. Servers also store data.

I'm asking because I'm trying also user /bin/spark-submit with snappy, and I don't see any action on my Spark standalone master console, but only in dasboard. I'm little confiused -> why I don'y see application in Spark standalone console ?

Use snappy-submit for Snappydata cluster. I am assuming you want to use the embedded mode of SnappyData. SnappyData works in two modes. Please check the docs here. https://snappydatainc.github.io/snappydata/howto/run_spark_job_inside_cluster/

Please check this HowTo example for how to submit jobs to SnappyData: https://snappydatainc.github.io/snappydata/howto/run_spark_job_inside_cluster/

There are other examples too there in the section called HowTos where we have tried to give a lot of code samples and examples. That should help.

Let us know if you have more questions.

Oct 26 '17 12:10 kneeraj

Thank you. I did some tests, and I would like to ask a few questions more:

I have one streaming application run as <snappy_folder>/bin/snappy-job.sh submit -> here I'm using snappy object **extends SnappyStreamingJob with runSnappyJob method. From Spark dashborad UI - from environment tab spark-master is set to snappydata://localhost
Second application - load data from file - do aggreagation - save to file and stop it . This one is started from bin/spark-submit --master --conf spark.snappydata.connection=localhost:1527 .... > In spark UI dashbord from environment table I see spark-master is set to correct one spark://spark_server:7077, but still I don't see anything on Spark master console -> no active context.

I need to have streaming application with continuous query working on my current Spark but integrated with SnappyData. This is the reason of my confusion -> anywhere I'm checking during the application work, I don't see connection to standalone Spark cluster.

Oct 27 '17 07:10 tommir1

@tommir1 Is your application logs shows it does not proceed at all ? As I understood , your Spark application is not getting launched due to some reason. You can check logs of Spark and it might give us a clue.

Oct 27 '17 11:10 rishitesh

snappydata snappydata copied to clipboard

Configuration of SnappyCluster with standalone Spark cluster

snappydata
snappydata copied to clipboard