CaffeOnSpark icon indicating copy to clipboard operation
CaffeOnSpark copied to clipboard

setting of spark instance

Open githubier opened this issue 8 years ago • 3 comments

I have three node in my spark cluster(one master and two slaves), each node have four cores. How to set the SPARK_WORKER_INSTANCES and the CORES_PER_WORKER can make me utilize all the 12 cores? It means how to set the command below:

export SPARK_WORKER_INSTANCES= ? export CORES_PER_WORKER= ?

Should I add the command below at all nodes's .bashrc? export MASTER_URL=spark://$(hostname):7077 export SPARK_WORKER_INSTANCES=1 export CORES_PER_WORKER=1 export TOTAL_CORES=$((${CORES_PER_WORKER}*${SPARK_WORKER_INSTANCES})) ${SPARK_HOME}/sbin/start-slave.sh -c $CORES_PER_WORKER -m 3G ${MASTER_URL}

githubier avatar Jan 13 '17 13:01 githubier

No matter what number I change, the cluster always use 4 cores totally, and the training get nowhere, any idea?

githubier avatar Jan 16 '17 13:01 githubier

The different flags and their values depends on the hardware specification of your own system.

Say if your system have 8 cores and you start 1 master and 2 workers with 2 cores-per-worker, then you'll have 2 core's left for your system's working. Similarly the memory.

The flags can be set each time while logging in to your machine or you can add it to the .bashrc file and forget about the configurations once you get the best one.

Read more here and here. You could also search for Spark resource allocation or something similar online.

arundasan91 avatar Jan 17 '17 21:01 arundasan91

I feel like CaffeOnSpark on cluster other than Yarn ones is really rigid. You have to make sure that clustersize*executorcores=coresmax, while batchsize>=trainsteps*devices*clustersize, at least this works for me

baristahell avatar Jan 23 '17 10:01 baristahell