snappydata
snappydata copied to clipboard
Configuration of SnappyCluster with standalone Spark cluster
Hello, I'm stuck with SnappyData cluster configuration I have 2 hosts with Spark cluster
- host 1 : master + slave
- host2 : slave
I want to run on those hosts SnappyData cluster
- host 1 : leads + locator + data_server
- host 2 : locator standby + data_server
I need to use SnappyData as database for my Spark cluster I created <snappy_home_folder>/conf/spark-env.sh file and I set SPARK_MASTER_HOST and SPARK_MASTER_PORT for my current spark standalone cluster.
I configured locator file for SnappyData, I'm trying set content for locators, but I don't understand the idea probably:
-
Do I have to set spark.executor.cores= and -spark.ui.port= . ? If yes - those value are for standalone Spark or Snappy embedded Spark?
-
If those value are for standalone Spark cluster value for -spark.ui.port= for each host: 1 or 2 ? I have two workers.
Thank you in advanced. Regards, Tom
@tommir1 For SnappyData configuration you can follow http://snappydatainc.github.io/snappydata/configuring_cluster/configuring_cluster/ . There you can see different parameters are set on corresponding property files e.g. conf/locators, conf/leads & conf/servers. For Spark you can follow the steps as you have mentioned. -spark-ui.port is for Spark driver or Snappy leader node only where a dashboard is launched. This dashboard shows aggregated values from different worker nodes. Hope I clarified your doubt.
Thank you for your answer. Correct me please, if I'm wrong:
- I have to set spark.executor.cores= even if I don't want to use embedded Spark? When I'm using SnappyData as database, embedded Spark is up and running ?
But when I have proper config in <snappy_home_folder>/conf/spark-env.sh (for standalone spark cluster) and I set -spark-ui.port - I have dashbord with which Spark ?
I'm asking because I'm trying also user
/bin/spark-submit with snappy, and I don't see any action on my Spark standalone master console, but only in dasboard. I'm little confiused -> why I don'y see application in Spark standalone console ?
Thank you a lot again for your help.
@tommir1
I have to set spark.executor.cores= even if I don't want to use embedded Spark?
When you say embedded Spark do you mean SnappyData engine. I did not quite get your question. But if you are not using SnappyData then setting of executor cores etc. should be as per Apache Spark guidelines. In most of the cases you can leave it as default.
When I'm using SnappyData as database, embedded Spark is up and running ?
Yes you need to start the SnappyData cluster. The cluster has a lead node which is equivalent to driver process of spark and the servers play the role of executors. Servers also store data.
I'm asking because I'm trying also user /bin/spark-submit with snappy, and I don't see any action on my Spark standalone master console, but only in dasboard. I'm little confiused -> why I don'y see application in Spark standalone console ?
Use snappy-submit for Snappydata cluster. I am assuming you want to use the embedded mode of SnappyData. SnappyData works in two modes. Please check the docs here. https://snappydatainc.github.io/snappydata/howto/run_spark_job_inside_cluster/
Please check this HowTo example for how to submit jobs to SnappyData: https://snappydatainc.github.io/snappydata/howto/run_spark_job_inside_cluster/
There are other examples too there in the section called HowTos where we have tried to give a lot of code samples and examples. That should help.
Let us know if you have more questions.
Thank you. I did some tests, and I would like to ask a few questions more:
- I have one streaming application run as <snappy_folder>/bin/snappy-job.sh submit -> here I'm using snappy object **extends SnappyStreamingJob with runSnappyJob method. From Spark dashborad UI - from environment tab spark-master is set to snappydata://localhost
- Second application - load data from file - do aggreagation - save to file and stop it . This one is started from
bin/spark-submit --master --conf spark.snappydata.connection=localhost:1527 .... > In spark UI dashbord from environment table I see spark-master is set to correct one spark://spark_server:7077, but still I don't see anything on Spark master console -> no active context.
I need to have streaming application with continuous query working on my current Spark but integrated with SnappyData. This is the reason of my confusion -> anywhere I'm checking during the application work, I don't see connection to standalone Spark cluster.
@tommir1 Is your application logs shows it does not proceed at all ? As I understood , your Spark application is not getting launched due to some reason. You can check logs of Spark and it might give us a clue.