CaffeOnSpark icon indicating copy to clipboard operation
CaffeOnSpark copied to clipboard

something wrong with standalone cluster

Open leadtekleadtek opened this issue 9 years ago • 3 comments

I have two nodes and i want to train mnist example on them ,but it give these wrong information:

ERROR CaffeOnSpark: Requested # of executors: 2 actual # of executors:1. Please try to set --conf spark.scheduler.maxRegisteredResourcesWaitingTime with a large value (default 30s)

my command on master node is below export MASTER_URL=spark://masterpc:7077 export SPARK_WORKER_INSTANCES=1 export CORES_PER_WORKER=1 export TOTAL_CORES=$((${CORES_PER_WORKER}${SPARK_WORKER_INSTANCES})) ${SPARK_HOME}/sbin/start-slave.sh -c $CORES_PER_WORKER -m 3G ${MASTER_URL} my command on slave node is below export MASTER_URL=spark://masterpc:7077 export SPARK_WORKER_INSTANCES=1 export CORES_PER_WORKER=1 export TOTAL_CORES=$((${CORES_PER_WORKER}${SPARK_WORKER_INSTANCES})) ${SPARK_HOME}/sbin/start-slave.sh -c $CORES_PER_WORKER -m 3G ${MASTER_URL} then,my command on master node is below spark-submit --master ${MASTER_URL}
--files ${CAFFE_ON_SPARK}/data/lenet_memory_solver.prototxt,${CAFFE_ON_SPARK}/data/lenet_memory_train_test.prototxt
--conf spark.cores.max=2
--conf spark.task.cpus=${CORES_PER_WORKER}
--conf spark.driver.extraLibraryPath="${LD_LIBRARY_PATH}"
--conf spark.executorEnv.LD_LIBRARY_PATH="${LD_LIBRARY_PATH}"
--class com.yahoo.ml.caffe.CaffeOnSpark
${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar
-train
-features accuracy,loss -label label
-conf lenet_memory_solver.prototxt
-clusterSize 2
-devices 1
-connection ethernet
-model file:${CAFFE_ON_SPARK}/mnist_lenet.model
-output file:${CAFFE_ON_SPARK}/lenet_features_result
anyone can tell me how to train the mnist on standalone cluster ?thanks!

leadtekleadtek avatar Nov 03 '16 02:11 leadtekleadtek

The error says "Requested # of executors: 2 actual # of executors:1". Somehow only 1 executor was available. Some of the cluster/job settings may be incorrect. Check CORES_PER_WORK, TOTAL_CORES, etc.

junshi15 avatar Nov 03 '16 19:11 junshi15

If this works for you, then please update here and close the bug. Thanks

mriduljain avatar Nov 29 '16 06:11 mriduljain

I suffer the same question, from the Spark WebUI,there are two workers alive, if i set as below, only one worker is used.

export SPARK_WORKER_INSTANCES=1 export CORES_PER_WORKER=1 export TOTAL_CORES=$((${CORES_PER_WORKER}*${SPARK_WORKER_INSTANCES})) .

When i use the spark-submit with the config below, the error occurs

--conf spark.cores.max=2
-clusterSize 2 \

eaglew94 avatar Feb 20 '17 13:02 eaglew94