something wrong with standalone cluster
I have two nodes and i want to train mnist example on them ,but it give these wrong information:
ERROR CaffeOnSpark: Requested # of executors: 2 actual # of executors:1. Please try to set --conf spark.scheduler.maxRegisteredResourcesWaitingTime with a large value (default 30s)
my command on master node is below
export MASTER_URL=spark://masterpc:7077
export SPARK_WORKER_INSTANCES=1
export CORES_PER_WORKER=1
export TOTAL_CORES=$((${CORES_PER_WORKER}${SPARK_WORKER_INSTANCES}))
${SPARK_HOME}/sbin/start-slave.sh -c $CORES_PER_WORKER -m 3G ${MASTER_URL}
my command on slave node is below
export MASTER_URL=spark://masterpc:7077
export SPARK_WORKER_INSTANCES=1
export CORES_PER_WORKER=1
export TOTAL_CORES=$((${CORES_PER_WORKER}${SPARK_WORKER_INSTANCES}))
${SPARK_HOME}/sbin/start-slave.sh -c $CORES_PER_WORKER -m 3G ${MASTER_URL}
then,my command on master node is below
spark-submit --master ${MASTER_URL}
--files ${CAFFE_ON_SPARK}/data/lenet_memory_solver.prototxt,${CAFFE_ON_SPARK}/data/lenet_memory_train_test.prototxt
--conf spark.cores.max=2
--conf spark.task.cpus=${CORES_PER_WORKER}
--conf spark.driver.extraLibraryPath="${LD_LIBRARY_PATH}"
--conf spark.executorEnv.LD_LIBRARY_PATH="${LD_LIBRARY_PATH}"
--class com.yahoo.ml.caffe.CaffeOnSpark
${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar
-train
-features accuracy,loss -label label
-conf lenet_memory_solver.prototxt
-clusterSize 2
-devices 1
-connection ethernet
-model file:${CAFFE_ON_SPARK}/mnist_lenet.model
-output file:${CAFFE_ON_SPARK}/lenet_features_result
anyone can tell me how to train the mnist on standalone cluster ?thanks!
The error says "Requested # of executors: 2 actual # of executors:1". Somehow only 1 executor was available. Some of the cluster/job settings may be incorrect. Check CORES_PER_WORK, TOTAL_CORES, etc.
If this works for you, then please update here and close the bug. Thanks
I suffer the same question, from the Spark WebUI,there are two workers alive, if i set as below, only one worker is used.
export SPARK_WORKER_INSTANCES=1 export CORES_PER_WORKER=1 export TOTAL_CORES=$((${CORES_PER_WORKER}*${SPARK_WORKER_INSTANCES})) .
When i use the spark-submit with the config below, the error occurs
--conf spark.cores.max=2
-clusterSize 2 \