CaffeOnSpark
CaffeOnSpark copied to clipboard
Error in Train a DNN network using CaffeOnSpark with 2 Spark executors
Hi, after successful build of my CaffeOnSpark, ./spark-submit --master ${MASTER_URL} \
--files ${CAFFE_ON_SPARK}/data/lenet_memory_solver.prototxt,${CAFFE_ON_SPARK}/data/lenet_memory_train_test.prototxt \ --conf spark.cores.max=${TOTAL_CORES} \ --conf spark.task.cpus=${CORES_PER_WORKER} \ --conf spark.driver.extraLibraryPath="${LD_LIBRARY_PATH}" \ --conf spark.executorEnv.LD_LIBRARY_PATH="${LD_LIBRARY_PATH}" \ --class com.yahoo.ml.caffe.CaffeOnSpark \ ${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar \ -train \ -features accuracy,loss -label label \ -conf lenet_memory_solver.prototxt \ -clusterSize ${SPARK_WORKER_INSTANCES} \ -devices 1 \ -connection ethernet \ -model file:${CAFFE_ON_SPARK}/mnist_lenet.model \ -output file:${CAFFE_ON_SPARK}/lenet_features_result
An error displayed, which said: Error: Cannot load main class from JAR file:/data/lenet_memory_solver.prototxt,/data/lenet_memory_train_test.prototxt Run with --help for usage help or --verbose for debug output Can someone help me plz, and thanks so much
Seems like it mixes up the prototxts and caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar What are your ENV variables? What kind of cluster are you working on? The default from the wiki page?
EDIT : I guess your CAFFE_ON_SPARK variable isn't set when it should be something like /opt/CaffeOnSpark