CaffeOnSpark Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues.

Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues.

Open silence-liu opened this issue 8 years ago • 1 comments

I used caffeonspark load a external caffemodel to run test ! but encounter difficulties.
I run a caffeonspark job on spark-standlone cluster.the following is my configuration and log.
please give me some suggestion or solution. thank you !!

run log

16/12/16 16:10:13 INFO spark.SecurityManager: Changing modify acls to: root 16/12/16 16:10:13 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 16/12/16 16:10:14 INFO util.Utils: Successfully started service 'sparkDriver' on port 48835. 16/12/16 16:10:14 INFO slf4j.Slf4jLogger: Slf4jLogger started 16/12/16 16:10:14 INFO Remoting: Starting remoting 16/12/16 16:10:14 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:49460] 16/12/16 16:10:14 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 49460. 16/12/16 16:10:14 INFO spark.SparkEnv: Registering MapOutputTracker 16/12/16 16:10:14 INFO spark.SparkEnv: Registering BlockManagerMaster 16/12/16 16:10:14 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-741618f2-5ef8-484d-99cd-d6cc32a31fce 16/12/16 16:10:14 INFO storage.MemoryStore: MemoryStore started with capacity 511.1 MB 16/12/16 16:10:14 INFO spark.SparkEnv: Registering OutputCommitCoordinator 16/12/16 16:10:14 INFO server.Server: jetty-8.y.z-SNAPSHOT 16/12/16 16:10:14 INFO server.AbstractConnector: Started [email protected]:4040 16/12/16 16:10:14 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 16/12/16 16:10:14 INFO ui.SparkUI: Started SparkUI at http://192.168.15.204:4040 16/12/16 16:10:14 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-2f175bca-17c7-4b89-9874-e2b8a1d0c029/httpd-b0c8c11f-0bb0-446b-ac70-8edf192269dd 16/12/16 16:10:14 INFO spark.HttpServer: Starting HTTP Server 16/12/16 16:10:14 INFO server.Server: jetty-8.y.z-SNAPSHOT 16/12/16 16:10:14 INFO server.AbstractConnector: Started [email protected]:50747 16/12/16 16:10:14 INFO util.Utils: Successfully started service 'HTTP file server' on port 50747. 16/12/16 16:10:14 INFO spark.SparkContext: Added JAR file:/opt/soft/caffeonspark/CaffeOnSpark-master/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar at http://192.168.15.204:50747/jars/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar with timestamp 1481875814627 16/12/16 16:10:14 INFO util.Utils: Copying /opt/soft/caffeonspark/CaffeOnSpark-master/data/solver.prototxt to /tmp/spark-2f175bca-17c7-4b89-9874-e2b8a1d0c029/userFiles-1190a2f1-d59f-45aa-873f-c28d7b87b4a6/solver.prototxt 16/12/16 16:10:14 INFO spark.SparkContext: Added file file:/opt/soft/caffeonspark/CaffeOnSpark-master/data/solver.prototxt at http://192.168.15.204:50747/files/solver.prototxt with timestamp 1481875814673 16/12/16 16:10:14 INFO util.Utils: Copying /opt/soft/caffeonspark/CaffeOnSpark-master/data/train_valx.prototxt to /tmp/spark-2f175bca-17c7-4b89-9874-e2b8a1d0c029/userFiles-1190a2f1-d59f-45aa-873f-c28d7b87b4a6/train_valx.prototxt 16/12/16 16:10:14 INFO spark.SparkContext: Added file file:/opt/soft/caffeonspark/CaffeOnSpark-master/data/train_valx.prototxt at http://192.168.15.204:50747/files/train_valx.prototxt with timestamp 1481875814677 16/12/16 16:10:14 INFO client.AppClient$ClientEndpoint: Connecting to master spark://bigdata4.gds.com:7077... 16/12/16 16:10:14 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20161216161014-0002 16/12/16 16:10:14 INFO client.AppClient$ClientEndpoint: Executor added: app-20161216161014-0002/0 on worker-20161216160701-192.168.15.204-41550 (192.168.15.204:41550) with 1 cores 16/12/16 16:10:14 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51913. 16/12/16 16:10:14 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20161216161014-0002/0 on hostPort 192.168.15.204:41550 with 1 cores, 1024.0 MB RAM 16/12/16 16:10:14 INFO netty.NettyBlockTransferService: Server created on 51913 16/12/16 16:10:14 INFO storage.BlockManagerMaster: Trying to register BlockManager 16/12/16 16:10:14 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.15.204:51913 with 511.1 MB RAM, BlockManagerId(driver, 192.168.15.204, 51913) 16/12/16 16:10:14 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/0 is now RUNNING 16/12/16 16:10:14 INFO storage.BlockManagerMaster: Registered BlockManager 16/12/16 16:10:15 INFO scheduler.EventLoggingListener: Logging events to hdfs://bigdata4.gds.com:9000/logs/eventLogs/app-20161216161014-0002 16/12/16 16:10:16 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (bigdata4.gds.com:53282) with ID 0 16/12/16 16:10:16 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 1.0 16/12/16 16:10:16 INFO storage.BlockManagerMasterEndpoint: Registering block manager bigdata4.gds.com:55499 with 511.1 MB RAM, BlockManagerId(0, bigdata4.gds.com, 55499) 16/12/16 16:10:17 INFO caffe.DataSource$: Source data layer:1 16/12/16 16:10:17 INFO caffe.LMDB: Batch size:10 16/12/16 16:10:17 WARN caffe.Config: both -test and -features are found, we will do test only (as it is latest), disabling feature mode. 16/12/16 16:10:17 INFO caffe.CaffeOnSpark: Method : test, Message : start Test 16/12/16 16:10:17 INFO caffe.CaffeOnSpark: Method : features2, Message : start features2 ..... 16/12/16 16:10:17 INFO caffe.LmdbRDD: local LMDB path:/opt/soft/caffeonspark/CaffeOnSpark-master/data/val_db 16/12/16 16:10:17 INFO caffe.LmdbRDD: 1 LMDB RDD partitions 16/12/16 16:10:17 INFO spark.SparkContext: Starting job: count at CaffeOnSpark.scala:435 16/12/16 16:10:17 INFO scheduler.DAGScheduler: Got job 0 (count at CaffeOnSpark.scala:435) with 1 output partitions 16/12/16 16:10:17 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (count at CaffeOnSpark.scala:435) 16/12/16 16:10:17 INFO scheduler.DAGScheduler: Parents of final stage: List() 16/12/16 16:10:17 INFO scheduler.DAGScheduler: Missing parents: List() 16/12/16 16:10:17 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at filter at LMDB.scala:36), which has no missing parents 16/12/16 16:10:17 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.3 KB, free 3.3 KB) 16/12/16 16:10:17 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.1 KB, free 5.3 KB) 16/12/16 16:10:17 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.15.204:51913 (size: 2.1 KB, free: 511.1 MB) 16/12/16 16:10:17 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006 16/12/16 16:10:17 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at filter at LMDB.scala:36) 16/12/16 16:10:17 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 16/12/16 16:10:17 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, bigdata4.gds.com, partition 0,PROCESS_LOCAL, 2115 bytes) 16/12/16 16:10:17 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on bigdata4.gds.com:55499 (size: 2.1 KB, free: 511.1 MB) 16/12/16 16:10:18 INFO storage.BlockManagerInfo: Added rdd_1_0 on disk on bigdata4.gds.com:55499 (size: 30.3 MB) 16/12/16 16:10:18 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 527 ms on bigdata4.gds.com (1/1) 16/12/16 16:10:18 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 16/12/16 16:10:18 INFO scheduler.DAGScheduler: ResultStage 0 (count at CaffeOnSpark.scala:435) finished in 0.531 s 16/12/16 16:10:18 INFO scheduler.DAGScheduler: Job 0 finished: count at CaffeOnSpark.scala:435, took 0.701952 s 16/12/16 16:10:18 INFO caffe.CaffeOnSpark: Method : features2, Message : srcDataRDD Count : 247 16/12/16 16:10:18 INFO spark.SparkContext: Starting job: count at CaffeOnSpark.scala:444 16/12/16 16:10:18 INFO scheduler.DAGScheduler: Got job 1 (count at CaffeOnSpark.scala:444) with 1 output partitions 16/12/16 16:10:18 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (count at CaffeOnSpark.scala:444) 16/12/16 16:10:18 INFO scheduler.DAGScheduler: Parents of final stage: List() 16/12/16 16:10:18 INFO scheduler.DAGScheduler: Missing parents: List() 16/12/16 16:10:18 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[3] at map at CaffeOnSpark.scala:439), which has no missing parents 16/12/16 16:10:18 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.0 KB, free 9.3 KB) 16/12/16 16:10:18 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.5 KB, free 11.8 KB) 16/12/16 16:10:18 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.15.204:51913 (size: 2.5 KB, free: 511.1 MB) 16/12/16 16:10:18 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 16/12/16 16:10:18 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[3] at map at CaffeOnSpark.scala:439) 16/12/16 16:10:18 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 1 tasks 16/12/16 16:10:18 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, bigdata4.gds.com, partition 0,PROCESS_LOCAL, 2180 bytes) 16/12/16 16:10:18 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on bigdata4.gds.com:55499 (size: 2.5 KB, free: 511.1 MB) 16/12/16 16:10:19 ERROR scheduler.TaskSchedulerImpl: Lost executor 0 on bigdata4.gds.com: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/12/16 16:10:19 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, bigdata4.gds.com): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/12/16 16:10:19 INFO scheduler.DAGScheduler: Executor lost: 0 (epoch 0) 16/12/16 16:10:19 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 0 from BlockManagerMaster. 16/12/16 16:10:19 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(0, bigdata4.gds.com, 55499) 16/12/16 16:10:19 INFO storage.BlockManagerMaster: Removed 0 successfully in removeExecutor 16/12/16 16:10:19 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/0 is now EXITED (Command exited with code 134) 16/12/16 16:10:19 INFO cluster.SparkDeploySchedulerBackend: Executor app-20161216161014-0002/0 removed: Command exited with code 134 16/12/16 16:10:19 INFO cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 0 16/12/16 16:10:19 INFO client.AppClient$ClientEndpoint: Executor added: app-20161216161014-0002/1 on worker-20161216160701-192.168.15.204-41550 (192.168.15.204:41550) with 1 cores 16/12/16 16:10:19 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20161216161014-0002/1 on hostPort 192.168.15.204:41550 with 1 cores, 1024.0 MB RAM 16/12/16 16:10:19 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/1 is now RUNNING 16/12/16 16:10:20 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (bigdata4.gds.com:53289) with ID 1 16/12/16 16:10:20 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 1.0 (TID 2, bigdata4.gds.com, partition 0,PROCESS_LOCAL, 2180 bytes) 16/12/16 16:10:20 INFO storage.BlockManagerMasterEndpoint: Registering block manager bigdata4.gds.com:50688 with 511.1 MB RAM, BlockManagerId(1, bigdata4.gds.com, 50688) 16/12/16 16:10:20 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on bigdata4.gds.com:50688 (size: 2.5 KB, free: 511.1 MB) 16/12/16 16:10:21 ERROR scheduler.TaskSchedulerImpl: Lost executor 1 on bigdata4.gds.com: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/12/16 16:10:21 WARN scheduler.TaskSetManager: Lost task 0.1 in stage 1.0 (TID 2, bigdata4.gds.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/12/16 16:10:21 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 1) 16/12/16 16:10:21 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster. 16/12/16 16:10:21 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, bigdata4.gds.com, 50688) 16/12/16 16:10:21 INFO storage.BlockManagerMaster: Removed 1 successfully in removeExecutor 16/12/16 16:10:21 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/1 is now EXITED (Command exited with code 134) 16/12/16 16:10:21 INFO cluster.SparkDeploySchedulerBackend: Executor app-20161216161014-0002/1 removed: Command exited with code 134 16/12/16 16:10:21 INFO cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 1 16/12/16 16:10:21 INFO client.AppClient$ClientEndpoint: Executor added: app-20161216161014-0002/2 on worker-20161216160701-192.168.15.204-41550 (192.168.15.204:41550) with 1 cores 16/12/16 16:10:21 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20161216161014-0002/2 on hostPort 192.168.15.204:41550 with 1 cores, 1024.0 MB RAM 16/12/16 16:10:21 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/2 is now RUNNING 16/12/16 16:10:22 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (bigdata4.gds.com:53297) with ID 2 16/12/16 16:10:22 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 1.0 (TID 3, bigdata4.gds.com, partition 0,PROCESS_LOCAL, 2180 bytes) 16/12/16 16:10:22 INFO storage.BlockManagerMasterEndpoint: Registering block manager bigdata4.gds.com:47828 with 511.1 MB RAM, BlockManagerId(2, bigdata4.gds.com, 47828) 16/12/16 16:10:23 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on bigdata4.gds.com:47828 (size: 2.5 KB, free: 511.1 MB) 16/12/16 16:10:24 ERROR scheduler.TaskSchedulerImpl: Lost executor 2 on bigdata4.gds.com: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/12/16 16:10:24 WARN scheduler.TaskSetManager: Lost task 0.2 in stage 1.0 (TID 3, bigdata4.gds.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/12/16 16:10:24 INFO scheduler.DAGScheduler: Executor lost: 2 (epoch 2) 16/12/16 16:10:24 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster. 16/12/16 16:10:24 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, bigdata4.gds.com, 47828) 16/12/16 16:10:24 INFO storage.BlockManagerMaster: Removed 2 successfully in removeExecutor 16/12/16 16:10:24 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/2 is now EXITED (Command exited with code 134) 16/12/16 16:10:24 INFO cluster.SparkDeploySchedulerBackend: Executor app-20161216161014-0002/2 removed: Command exited with code 134 16/12/16 16:10:24 INFO cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 2 16/12/16 16:10:24 INFO client.AppClient$ClientEndpoint: Executor added: app-20161216161014-0002/3 on worker-20161216160701-192.168.15.204-41550 (192.168.15.204:41550) with 1 cores 16/12/16 16:10:24 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20161216161014-0002/3 on hostPort 192.168.15.204:41550 with 1 cores, 1024.0 MB RAM 16/12/16 16:10:24 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/3 is now RUNNING 16/12/16 16:10:25 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (bigdata4.gds.com:53304) with ID 3 16/12/16 16:10:25 INFO scheduler.TaskSetManager: Starting task 0.3 in stage 1.0 (TID 4, bigdata4.gds.com, partition 0,PROCESS_LOCAL, 2180 bytes) 16/12/16 16:10:25 INFO storage.BlockManagerMasterEndpoint: Registering block manager bigdata4.gds.com:38664 with 511.1 MB RAM, BlockManagerId(3, bigdata4.gds.com, 38664) 16/12/16 16:10:25 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.15.204:51913 in memory (size: 2.1 KB, free: 511.1 MB) 16/12/16 16:10:25 INFO spark.ContextCleaner: Cleaned accumulator 1 16/12/16 16:10:25 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on bigdata4.gds.com:38664 (size: 2.5 KB, free: 511.1 MB) 16/12/16 16:10:26 ERROR scheduler.TaskSchedulerImpl: Lost executor 3 on bigdata4.gds.com: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/12/16 16:10:26 WARN scheduler.TaskSetManager: Lost task 0.3 in stage 1.0 (TID 4, bigdata4.gds.com): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/12/16 16:10:26 ERROR scheduler.TaskSetManager: Task 0 in stage 1.0 failed 4 times; aborting job 16/12/16 16:10:26 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 16/12/16 16:10:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/3 is now EXITED (Command exited with code 134) 16/12/16 16:10:26 INFO cluster.SparkDeploySchedulerBackend: Executor app-20161216161014-0002/3 removed: Command exited with code 134 16/12/16 16:10:26 INFO cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 3 16/12/16 16:10:26 INFO scheduler.TaskSchedulerImpl: Cancelling stage 1 16/12/16 16:10:26 INFO client.AppClient$ClientEndpoint: Executor added: app-20161216161014-0002/4 on worker-20161216160701-192.168.15.204-41550 (192.168.15.204:41550) with 1 cores 16/12/16 16:10:26 INFO scheduler.DAGScheduler: ResultStage 1 (count at CaffeOnSpark.scala:444) failed in 8.594 s 16/12/16 16:10:26 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20161216161014-0002/4 on hostPort 192.168.15.204:41550 with 1 cores, 1024.0 MB RAM 16/12/16 16:10:26 INFO scheduler.DAGScheduler: Executor lost: 3 (epoch 3) 16/12/16 16:10:26 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 3 from BlockManagerMaster. 16/12/16 16:10:26 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(3, bigdata4.gds.com, 38664) 16/12/16 16:10:26 INFO scheduler.DAGScheduler: Job 1 failed: count at CaffeOnSpark.scala:444, took 8.607960 s Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, bigdata4.gds.com): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) at org.apache.spark.rdd.RDD.count(RDD.scala:1143) at com.yahoo.ml.caffe.CaffeOnSpark.features2(CaffeOnSpark.scala:444) at com.yahoo.ml.caffe.CaffeOnSpark.test(CaffeOnSpark.scala:382) at com.yahoo.ml.caffe.CaffeOnSpark$.main(CaffeOnSpark.scala:38) at com.yahoo.ml.caffe.CaffeOnSpark.main(CaffeOnSpark.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 16/12/16 16:10:26 INFO spark.SparkContext: Invoking stop() from shutdown hook 16/12/16 16:10:26 INFO storage.BlockManagerMaster: Removed 3 successfully in removeExecutor 16/12/16 16:10:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/4 is now RUNNING 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static/sql,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/SQL/execution/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/SQL/execution,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/SQL/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/SQL,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null} 16/12/16 16:10:26 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.15.204:4040 16/12/16 16:10:26 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors 16/12/16 16:10:26 INFO cluster.SparkDeploySchedulerBackend: Asking each executor to shut down 16/12/16 16:10:26 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 16/12/16 16:10:26 INFO storage.MemoryStore: MemoryStore cleared 16/12/16 16:10:26 INFO storage.BlockManager: BlockManager stopped 16/12/16 16:10:26 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 16/12/16 16:10:26 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 16/12/16 16:10:26 INFO spark.SparkContext: Successfully stopped SparkContext 16/12/16 16:10:26 INFO util.ShutdownHookManager: Shutdown hook called 16/12/16 16:10:26 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2f175bca-17c7-4b89-9874-e2b8a1d0c029/httpd-b0c8c11f-0bb0-446b-ac70-8edf192269dd 16/12/16 16:10:26 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2f175bca-17c7-4b89-9874-e2b8a1d0c029 16/12/16 16:10:26 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.

solver.prototxt

test_iter: 4 test_interval: 12 base_lr: 0.001 display: 1 max_iter: 360 lr_policy: "exp" gamma: 0.985852887007 momentum: 0.9 weight_decay: 1e-05 snapshot: 12 snapshot_prefix: "snapshot" solver_mode: CPU random_seed: 7 net: "train_valx.prototxt" solver_type: SGD

train_valx.prototxt

layer { name: "train-data" type: "MemoryData" top: "data" top: "label" include { phase: TRAIN } source_class: "com.yahoo.ml.caffe.LMDB" transform_param { mirror: true crop_size: 224 mean_file: "file:///opt/soft/caffeonspark/CaffeOnSpark-master/data/mean.binaryproto" } memory_data_param { source: "file:///opt/soft/caffeonspark/CaffeOnSpark-master/data/train_db" batch_size: 7 channels: 1 height: 256 width: 256 share_in_parallel: false } } layer { name: "val-data" type: "MemoryData" top: "data" top: "label" include { phase: TEST } transform_param { mirror: false crop_size: 224 mean_file: "file:///opt/soft/caffeonspark/CaffeOnSpark-master/data/mean.binaryproto" } source_class: "com.yahoo.ml.caffe.LMDB" memory_data_param { source: "file:///opt/soft/caffeonspark/CaffeOnSpark-master/data/val_db" batch_size: 10 channels: 1 height: 256 width: 256 share_in_parallel: false } } layer { name: "conv1/7x7_s2" type: "Convolution" bottom: "data" top: "conv1/7x7_s2" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 pad: 3 kernel_size: 7 stride: 2 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "conv1/relu_7x7" type: "ReLU" bottom: "conv1/7x7_s2" top: "conv1/7x7_s2" } layer { name: "pool1/3x3_s2" type: "Pooling" bottom: "conv1/7x7_s2" top: "pool1/3x3_s2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "pool1/norm1" type: "LRN" bottom: "pool1/3x3_s2" top: "pool1/norm1" lrn_param { local_size: 5 alpha: 9.99999974738e-05 beta: 0.75 } } layer { name: "conv2/3x3_reduce" type: "Convolution" bottom: "pool1/norm1" top: "conv2/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "conv2/relu_3x3_reduce" type: "ReLU" bottom: "conv2/3x3_reduce" top: "conv2/3x3_reduce" } layer { name: "conv2/3x3" type: "Convolution" bottom: "conv2/3x3_reduce" top: "conv2/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 192 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "conv2/relu_3x3" type: "ReLU" bottom: "conv2/3x3" top: "conv2/3x3" } layer { name: "conv2/norm2" type: "LRN" bottom: "conv2/3x3" top: "conv2/norm2" lrn_param { local_size: 5 alpha: 9.99999974738e-05 beta: 0.75 } } layer { name: "pool2/3x3_s2" type: "Pooling" bottom: "conv2/norm2" top: "pool2/3x3_s2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "inception_3a/1x1" type: "Convolution" bottom: "pool2/3x3_s2" top: "inception_3a/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3a/relu_1x1" type: "ReLU" bottom: "inception_3a/1x1" top: "inception_3a/1x1" } layer { name: "inception_3a/3x3_reduce" type: "Convolution" bottom: "pool2/3x3_s2" top: "inception_3a/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 96 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3a/relu_3x3_reduce" type: "ReLU" bottom: "inception_3a/3x3_reduce" top: "inception_3a/3x3_reduce" } layer { name: "inception_3a/3x3" type: "Convolution" bottom: "inception_3a/3x3_reduce" top: "inception_3a/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3a/relu_3x3" type: "ReLU" bottom: "inception_3a/3x3" top: "inception_3a/3x3" } layer { name: "inception_3a/5x5_reduce" type: "Convolution" bottom: "pool2/3x3_s2" top: "inception_3a/5x5_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 16 kernel_size: 1 weight_filler { type: "xavier" std: 0.20000000298 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3a/relu_5x5_reduce" type: "ReLU" bottom: "inception_3a/5x5_reduce" top: "inception_3a/5x5_reduce" } layer { name: "inception_3a/5x5" type: "Convolution" bottom: "inception_3a/5x5_reduce" top: "inception_3a/5x5" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 32 pad: 2 kernel_size: 5 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3a/relu_5x5" type: "ReLU" bottom: "inception_3a/5x5" top: "inception_3a/5x5" } layer { name: "inception_3a/pool" type: "Pooling" bottom: "pool2/3x3_s2" top: "inception_3a/pool" pooling_param { pool: MAX kernel_size: 3 stride: 1 pad: 1 } } layer { name: "inception_3a/pool_proj" type: "Convolution" bottom: "inception_3a/pool" top: "inception_3a/pool_proj" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 32 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3a/relu_pool_proj" type: "ReLU" bottom: "inception_3a/pool_proj" top: "inception_3a/pool_proj" } layer { name: "inception_3a/output" type: "Concat" bottom: "inception_3a/1x1" bottom: "inception_3a/3x3" bottom: "inception_3a/5x5" bottom: "inception_3a/pool_proj" top: "inception_3a/output" } layer { name: "inception_3b/1x1" type: "Convolution" bottom: "inception_3a/output" top: "inception_3b/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3b/relu_1x1" type: "ReLU" bottom: "inception_3b/1x1" top: "inception_3b/1x1" } layer { name: "inception_3b/3x3_reduce" type: "Convolution" bottom: "inception_3a/output" top: "inception_3b/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3b/relu_3x3_reduce" type: "ReLU" bottom: "inception_3b/3x3_reduce" top: "inception_3b/3x3_reduce" } layer { name: "inception_3b/3x3" type: "Convolution" bottom: "inception_3b/3x3_reduce" top: "inception_3b/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 192 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3b/relu_3x3" type: "ReLU" bottom: "inception_3b/3x3" top: "inception_3b/3x3" } layer { name: "inception_3b/5x5_reduce" type: "Convolution" bottom: "inception_3a/output" top: "inception_3b/5x5_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 32 kernel_size: 1 weight_filler { type: "xavier" std: 0.20000000298 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3b/relu_5x5_reduce" type: "ReLU" bottom: "inception_3b/5x5_reduce" top: "inception_3b/5x5_reduce" } layer { name: "inception_3b/5x5" type: "Convolution" bottom: "inception_3b/5x5_reduce" top: "inception_3b/5x5" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 96 pad: 2 kernel_size: 5 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3b/relu_5x5" type: "ReLU" bottom: "inception_3b/5x5" top: "inception_3b/5x5" } layer { name: "inception_3b/pool" type: "Pooling" bottom: "inception_3a/output" top: "inception_3b/pool" pooling_param { pool: MAX kernel_size: 3 stride: 1 pad: 1 } } layer { name: "inception_3b/pool_proj" type: "Convolution" bottom: "inception_3b/pool" top: "inception_3b/pool_proj" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3b/relu_pool_proj" type: "ReLU" bottom: "inception_3b/pool_proj" top: "inception_3b/pool_proj" } layer { name: "inception_3b/output" type: "Concat" bottom: "inception_3b/1x1" bottom: "inception_3b/3x3" bottom: "inception_3b/5x5" bottom: "inception_3b/pool_proj" top: "inception_3b/output" } layer { name: "pool3/3x3_s2" type: "Pooling" bottom: "inception_3b/output" top: "pool3/3x3_s2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "inception_4a/1x1" type: "Convolution" bottom: "pool3/3x3_s2" top: "inception_4a/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 192 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4a/relu_1x1" type: "ReLU" bottom: "inception_4a/1x1" top: "inception_4a/1x1" } layer { name: "inception_4a/3x3_reduce" type: "Convolution" bottom: "pool3/3x3_s2" top: "inception_4a/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 96 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4a/relu_3x3_reduce" type: "ReLU" bottom: "inception_4a/3x3_reduce" top: "inception_4a/3x3_reduce" } layer { name: "inception_4a/3x3" type: "Convolution" bottom: "inception_4a/3x3_reduce" top: "inception_4a/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 208 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4a/relu_3x3" type: "ReLU" bottom: "inception_4a/3x3" top: "inception_4a/3x3" } layer { name: "inception_4a/5x5_reduce" type: "Convolution" bottom: "pool3/3x3_s2" top: "inception_4a/5x5_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 16 kernel_size: 1 weight_filler { type: "xavier" std: 0.20000000298 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4a/relu_5x5_reduce" type: "ReLU" bottom: "inception_4a/5x5_reduce" top: "inception_4a/5x5_reduce" } layer { name: "inception_4a/5x5" type: "Convolution" bottom: "inception_4a/5x5_reduce" top: "inception_4a/5x5" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 48 pad: 2 kernel_size: 5 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4a/relu_5x5" type: "ReLU" bottom: "inception_4a/5x5" top: "inception_4a/5x5" } layer { name: "inception_4a/pool" type: "Pooling" bottom: "pool3/3x3_s2" top: "inception_4a/pool" pooling_param { pool: MAX kernel_size: 3 stride: 1 pad: 1 } } layer { name: "inception_4a/pool_proj" type: "Convolution" bottom: "inception_4a/pool" top: "inception_4a/pool_proj" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4a/relu_pool_proj" type: "ReLU" bottom: "inception_4a/pool_proj" top: "inception_4a/pool_proj" } layer { name: "inception_4a/output" type: "Concat" bottom: "inception_4a/1x1" bottom: "inception_4a/3x3" bottom: "inception_4a/5x5" bottom: "inception_4a/pool_proj" top: "inception_4a/output" } layer { name: "loss1/ave_pool" type: "Pooling" bottom: "inception_4a/output" top: "loss1/ave_pool" pooling_param { pool: AVE kernel_size: 5 stride: 3 } } layer { name: "loss1/conv" type: "Convolution" bottom: "loss1/ave_pool" top: "loss1/conv" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" std: 0.0799999982119 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "loss1/relu_conv" type: "ReLU" bottom: "loss1/conv" top: "loss1/conv" } layer { name: "loss1/fc" type: "InnerProduct" bottom: "loss1/conv" top: "loss1/fc" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } inner_product_param { num_output: 1024 weight_filler { type: "xavier" std: 0.019999999553 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "loss1/relu_fc" type: "ReLU" bottom: "loss1/fc" top: "loss1/fc" } layer { name: "loss1/drop_fc" type: "Dropout" bottom: "loss1/fc" top: "loss1/fc" dropout_param { dropout_ratio: 0.699999988079 } } layer { name: "loss1/classifier" type: "InnerProduct" bottom: "loss1/fc" top: "loss1/classifier" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } inner_product_param { num_output: 2 weight_filler { type: "xavier" std: 0.0009765625 } bias_filler { type: "constant" value: 0.0 } } } layer { name: "loss1/loss" type: "SoftmaxWithLoss" bottom: "loss1/classifier" bottom: "label" top: "loss1/loss" loss_weight: 0.300000011921 } layer { name: "loss1/top-1" type: "Accuracy" bottom: "loss1/classifier" bottom: "label" top: "loss1/accuracy" include { phase: TEST } } layer { name: "inception_4b/1x1" type: "Convolution" bottom: "inception_4a/output" top: "inception_4b/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 160 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4b/relu_1x1" type: "ReLU" bottom: "inception_4b/1x1" top: "inception_4b/1x1" } layer { name: "inception_4b/3x3_reduce" type: "Convolution" bottom: "inception_4a/output" top: "inception_4b/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 112 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4b/relu_3x3_reduce" type: "ReLU" bottom: "inception_4b/3x3_reduce" top: "inception_4b/3x3_reduce" } layer { name: "inception_4b/3x3" type: "Convolution" bottom: "inception_4b/3x3_reduce" top: "inception_4b/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 224 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4b/relu_3x3" type: "ReLU" bottom: "inception_4b/3x3" top: "inception_4b/3x3" } layer { name: "inception_4b/5x5_reduce" type: "Convolution" bottom: "inception_4a/output" top: "inception_4b/5x5_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 24 kernel_size: 1 weight_filler { type: "xavier" std: 0.20000000298 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4b/relu_5x5_reduce" type: "ReLU" bottom: "inception_4b/5x5_reduce" top: "inception_4b/5x5_reduce" } layer { name: "inception_4b/5x5" type: "Convolution" bottom: "inception_4b/5x5_reduce" top: "inception_4b/5x5" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 pad: 2 kernel_size: 5 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4b/relu_5x5" type: "ReLU" bottom: "inception_4b/5x5" top: "inception_4b/5x5" } layer { name: "inception_4b/pool" type: "Pooling" bottom: "inception_4a/output" top: "inception_4b/pool" pooling_param { pool: MAX kernel_size: 3 stride: 1 pad: 1 } } layer { name: "inception_4b/pool_proj" type: "Convolution" bottom: "inception_4b/pool" top: "inception_4b/pool_proj" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4b/relu_pool_proj" type: "ReLU" bottom: "inception_4b/pool_proj" top: "inception_4b/pool_proj" } layer { name: "inception_4b/output" type: "Concat" bottom: "inception_4b/1x1" bottom: "inception_4b/3x3" bottom: "inception_4b/5x5" bottom: "inception_4b/pool_proj" top: "inception_4b/output" } layer { name: "inception_4c/1x1" type: "Convolution" bottom: "inception_4b/output" top: "inception_4c/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4c/relu_1x1" type: "ReLU" bottom: "inception_4c/1x1" top: "inception_4c/1x1" } layer { name: "inception_4c/3x3_reduce" type: "Convolution" bottom: "inception_4b/output" top: "inception_4c/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4c/relu_3x3_reduce" type: "ReLU" bottom: "inception_4c/3x3_reduce" top: "inception_4c/3x3_reduce" } layer { name: "inception_4c/3x3" type: "Convolution" bottom: "inception_4c/3x3_reduce" top: "inception_4c/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4c/relu_3x3" type: "ReLU" bottom: "inception_4c/3x3" top: "inception_4c/3x3" } layer { name: "inception_4c/5x5_reduce" type: "Convolution" bottom: "inception_4b/output" top: "inception_4c/5x5_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 24 kernel_size: 1 weight_filler { type: "xavier" std: 0.20000000298 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4c/relu_5x5_reduce" type: "ReLU" bottom: "inception_4c/5x5_reduce" top: "inception_4c/5x5_reduce" } layer { name: "inception_4c/5x5" type: "Convolution" bottom: "inception_4c/5x5_reduce" top: "inception_4c/5x5" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 pad: 2 kernel_size: 5 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4c/relu_5x5" type: "ReLU" bottom: "inception_4c/5x5" top: "inception_4c/5x5" } layer { name: "inception_4c/pool" type: "Pooling" bottom: "inception_4b/output" top: "inception_4c/pool" pooling_param { pool: MAX kernel_size: 3 stride: 1 pad: 1 } } layer { name: "inception_4c/pool_proj" type: "Convolution" bottom: "inception_4c/pool" top: "inception_4c/pool_proj" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4c/relu_pool_proj" type: "ReLU" bottom: "inception_4c/pool_proj" top: "inception_4c/pool_proj" } layer { name: "inception_4c/output" type: "Concat" bottom: "inception_4c/1x1" bottom: "inception_4c/3x3" bottom: "inception_4c/5x5" bottom: "inception_4c/pool_proj" top: "inception_4c/output" } layer { name: "inception_4d/1x1" type: "Convolution" bottom: "inception_4c/output" top: "inception_4d/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 112 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4d/relu_1x1" type: "ReLU" bottom: "inception_4d/1x1" top: "inception_4d/1x1" } layer { name: "inception_4d/3x3_reduce" type: "Convolution" bottom: "inception_4c/output" top: "inception_4d/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 144 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4d/relu_3x3_reduce" type: "ReLU" bottom: "inception_4d/3x3_reduce" top: "inception_4d/3x3_reduce" } layer { name: "inception_4d/3x3" type: "Convolution" bottom: "inception_4d/3x3_reduce" top: "inception_4d/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 288 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4d/relu_3x3" type: "ReLU" bottom: "inception_4d/3x3" top: "inception_4d/3x3" } layer { name: "inception_4d/5x5_reduce" type: "Convolution" bottom: "inception_4c/output" top: "inception_4d/5x5_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 32 kernel_size: 1 weight_filler { type: "xavier" std: 0.20000000298 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4d/relu_5x5_reduce" type: "ReLU" bottom: "inception_4d/5x5_reduce" top: "inception_4d/5x5_reduce" } layer { name: "inception_4d/5x5" type: "Convolution" bottom: "inception_4d/5x5_reduce" top: "inception_4d/5x5" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 pad: 2 kernel_size: 5 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4d/relu_5x5" type: "ReLU" bottom: "inception_4d/5x5" top: "inception_4d/5x5" } layer { name: "inception_4d/pool" type: "Pooling" bottom: "inception_4c/output" top: "inception_4d/pool" pooling_param { pool: MAX kernel_size: 3 stride: 1 pad: 1 } } layer { name: "inception_4d/pool_proj" type: "Convolution" bottom: "inception_4d/pool" top: "inception_4d/pool_proj" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4d/relu_pool_proj" type: "ReLU" bottom: "inception_4d/pool_proj" top: "inception_4d/pool_proj" } layer { name: "inception_4d/output" type: "Concat" bottom: "inception_4d/1x1" bottom: "inception_4d/3x3" bottom: "inception_4d/5x5" bottom: "inception_4d/pool_proj" top: "inception_4d/output" } layer { name: "loss2/ave_pool" type: "Pooling" bottom: "inception_4d/output" top: "loss2/ave_pool" pooling_param { pool: AVE kernel_size: 5 stride: 3 } } layer { name: "loss2/conv" type: "Convolution" bottom: "loss2/ave_pool" top: "loss2/conv" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" std: 0.0799999982119 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "loss2/relu_conv" type: "ReLU" bottom: "loss2/conv" top: "loss2/conv" } layer { name: "loss2/fc" type: "InnerProduct" bottom: "loss2/conv" top: "loss2/fc" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } inner_product_param { num_output: 1024 weight_filler { type: "xavier" std: 0.019999999553 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "loss2/relu_fc" type: "ReLU" bottom: "loss2/fc" top: "loss2/fc" } layer { name: "loss2/drop_fc" type: "Dropout" bottom: "loss2/fc" top: "loss2/fc" dropout_param { dropout_ratio: 0.699999988079 } } layer { name: "loss2/classifier" type: "InnerProduct" bottom: "loss2/fc" top: "loss2/classifier" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } inner_product_param { num_output: 2 weight_filler { type: "xavier" std: 0.0009765625 } bias_filler { type: "constant" value: 0.0 } } } layer { name: "loss2/loss" type: "SoftmaxWithLoss" bottom: "loss2/classifier" bottom: "label" top: "loss2/loss" loss_weight: 0.300000011921 } layer { name: "loss2/top-1" type: "Accuracy" bottom: "loss2/classifier" bottom: "label" top: "loss2/accuracy" include { phase: TEST } } layer { name: "inception_4e/1x1" type: "Convolution" bottom: "inception_4d/output" top: "inception_4e/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 256 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4e/relu_1x1" type: "ReLU" bottom: "inception_4e/1x1" top: "inception_4e/1x1" } layer { name: "inception_4e/3x3_reduce" type: "Convolution" bottom: "inception_4d/output" top: "inception_4e/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 160 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4e/relu_3x3_reduce" type: "ReLU" bottom: "inception_4e/3x3_reduce" top: "inception_4e/3x3_reduce" } layer { name: "inception_4e/3x3" type: "Convolution" bottom: "inception_4e/3x3_reduce" top: "inception_4e/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 320 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4e/relu_3x3" type: "ReLU" bottom: "inception_4e/3x3" top: "inception_4e/3x3" } layer { name: "inception_4e/5x5_reduce" type: "Convolution" bottom: "inception_4d/output" top: "inception_4e/5x5_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 32 kernel_size: 1 weight_filler { type: "xavier" std: 0.20000000298 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4e/relu_5x5_reduce" type: "ReLU" bottom: "inception_4e/5x5_reduce" top: "inception_4e/5x5_reduce" } layer { name: "inception_4e/5x5" type: "Convolution" bottom: "inception_4e/5x5_reduce" top: "inception_4e/5x5" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 pad: 2 kernel_size: 5 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4e/relu_5x5" type: "ReLU" bottom: "inception_4e/5x5" top: "inception_4e/5x5" } layer { name: "inception_4e/pool" type: "Pooling" bottom: "inception_4d/output" top: "inception_4e/pool" pooling_param { pool: MAX kernel_size: 3 stride: 1 pad: 1 } } layer { name: "inception_4e/pool_proj" type: "Convolution" bottom: "inception_4e/pool" top: "inception_4e/pool_proj" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4e/relu_pool_proj" type: "ReLU" bottom: "inception_4e/pool_proj" top: "inception_4e/pool_proj" } layer { name: "inception_4e/output" type: "Concat" bottom: "inception_4e/1x1" bottom: "inception_4e/3x3" bottom: "inception_4e/5x5" bottom: "inception_4e/pool_proj" top: "inception_4e/output" } layer { name: "pool4/3x3_s2" type: "Pooling" bottom: "inception_4e/output" top: "pool4/3x3_s2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "inception_5a/1x1" type: "Convolution" bottom: "pool4/3x3_s2" top: "inception_5a/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 256 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5a/relu_1x1" type: "ReLU" bottom: "inception_5a/1x1" top: "inception_5a/1x1" } layer { name: "inception_5a/3x3_reduce" type: "Convolution" bottom: "pool4/3x3_s2" top: "inception_5a/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 160 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5a/relu_3x3_reduce" type: "ReLU" bottom: "inception_5a/3x3_reduce" top: "inception_5a/3x3_reduce" } layer { name: "inception_5a/3x3" type: "Convolution" bottom: "inception_5a/3x3_reduce" top: "inception_5a/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 320 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5a/relu_3x3" type: "ReLU" bottom: "inception_5a/3x3" top: "inception_5a/3x3" } layer { name: "inception_5a/5x5_reduce" type: "Convolution" bottom: "pool4/3x3_s2" top: "inception_5a/5x5_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 32 kernel_size: 1 weight_filler { type: "xavier" std: 0.20000000298 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5a/relu_5x5_reduce" type: "ReLU" bottom: "inception_5a/5x5_reduce" top: "inception_5a/5x5_reduce" } layer { name: "inception_5a/5x5" type: "Convolution" bottom: "inception_5a/5x5_reduce" top: "inception_5a/5x5" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 pad: 2 kernel_size: 5 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5a/relu_5x5" type: "ReLU" bottom: "inception_5a/5x5" top: "inception_5a/5x5" } layer { name: "inception_5a/pool" type: "Pooling" bottom: "pool4/3x3_s2" top: "inception_5a/pool" pooling_param { pool: MAX kernel_size: 3 stride: 1 pad: 1 } } layer { name: "inception_5a/pool_proj" type: "Convolution" bottom: "inception_5a/pool" top: "inception_5a/pool_proj" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5a/relu_pool_proj" type: "ReLU" bottom: "inception_5a/pool_proj" top: "inception_5a/pool_proj" } layer { name: "inception_5a/output" type: "Concat" bottom: "inception_5a/1x1" bottom: "inception_5a/3x3" bottom: "inception_5a/5x5" bottom: "inception_5a/pool_proj" top: "inception_5a/output" } layer { name: "inception_5b/1x1" type: "Convolution" bottom: "inception_5a/output" top: "inception_5b/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 384 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5b/relu_1x1" type: "ReLU" bottom: "inception_5b/1x1" top: "inception_5b/1x1" } layer { name: "inception_5b/3x3_reduce" type: "Convolution" bottom: "inception_5a/output" top: "inception_5b/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 192 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5b/relu_3x3_reduce" type: "ReLU" bottom: "inception_5b/3x3_reduce" top: "inception_5b/3x3_reduce" } layer { name: "inception_5b/3x3" type: "Convolution" bottom: "inception_5b/3x3_reduce" top: "inception_5b/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 384 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5b/relu_3x3" type: "ReLU" bottom: "inception_5b/3x3" top: "inception_5b/3x3" } layer { name: "inception_5b/5x5_reduce" type: "Convolution" bottom: "inception_5a/output" top: "inception_5b/5x5_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 48 kernel_size: 1 weight_filler { type: "xavier" std: 0.20000000298 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5b/relu_5x5_reduce" type: "ReLU" bottom: "inception_5b/5x5_reduce" top: "inception_5b/5x5_reduce" } layer { name: "inception_5b/5x5" type: "Convolution" bottom: "inception_5b/5x5_reduce" top: "inception_5b/5x5" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 pad: 2 kernel_size: 5 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5b/relu_5x5" type: "ReLU" bottom: "inception_5b/5x5" top: "inception_5b/5x5" } layer { name: "inception_5b/pool" type: "Pooling" bottom: "inception_5a/output" top: "inception_5b/pool" pooling_param { pool: MAX kernel_size: 3 stride: 1 pad: 1 } } layer { name: "inception_5b/pool_proj" type: "Convolution" bottom: "inception_5b/pool" top: "inception_5b/pool_proj" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5b/relu_pool_proj" type: "ReLU" bottom: "inception_5b/pool_proj" top: "inception_5b/pool_proj" } layer { name: "inception_5b/output" type: "Concat" bottom: "inception_5b/1x1" bottom: "inception_5b/3x3" bottom: "inception_5b/5x5" bottom: "inception_5b/pool_proj" top: "inception_5b/output" } layer { name: "pool5/7x7_s1" type: "Pooling" bottom: "inception_5b/output" top: "pool5/7x7_s1" pooling_param { pool: AVE kernel_size: 7 stride: 1 } } layer { name: "pool5/drop_7x7_s1" type: "Dropout" bottom: "pool5/7x7_s1" top: "pool5/7x7_s1" dropout_param { dropout_ratio: 0.40000000596 } } layer { name: "loss3/classifier_retrain" type: "InnerProduct" bottom: "pool5/7x7_s1" top: "loss3/classifier" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } inner_product_param { num_output: 2 weight_filler { type: "xavier" } bias_filler { type: "constant" value: 0.0 } } } layer { name: "loss3/loss" type: "SoftmaxWithLoss" bottom: "loss3/classifier" bottom: "label" top: "loss" loss_weight: 1.0 } layer { name: "loss3/top-1" type: "Accuracy" bottom: "loss3/classifier" bottom: "label" top: "accuracy" include { phase: TEST } }

spark-conf

export JAVA_HOME=/usr/java/jdk1.8.0_92 export SCALA_HOME=/opt/soft/scala-2.11.6 export SPARK_MASTER_IP=bigdata4.gds.com export SPARK_WORKER_MEMORY=2g export SPARK_WORKER_CORES=1 export MASTER=spark://bigdata4.gds.com:7077 export SPARK_WORKER_INSTANCES=1 export CORES_PER_WORKER=1 export TOTAL_CORES=$((${CORES_PER_WORKER}*${SPARK_WORKER_INSTANCES}))

spark-commit

spark-submit --master ${MASTER}
--num-executors 1
--files ${CAFFE_ON_SPARK}/data/solver.prototxt,${CAFFE_ON_SPARK}/data/train_valx.prototxt
--conf spark.cores.max=${TOTAL_CORES}
--conf spark.task.cpus=${CORES_PER_WORKER}
--conf spark.driver.extraLibraryPath="${LD_LIBRARY_PATH}"
--conf spark.executorEnv.LD_LIBRARY_PATH="${LD_LIBRARY_PATH}"
--class com.yahoo.ml.caffe.CaffeOnSpark
${CAFFE_ON_SPARK}/caffe-grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar
-features accuracy,loss -label label
-conf ${CAFFE_ON_SPARK}/data/solver.prototxt
-clusterSize ${SPARK_WORKER_INSTANCES}
-devices 1
-connection ethernet
-model hdfs://bigdata4.gds.com:9000/snapshot_iter_360.caffemodel

Code

object CaffeOnSpark { private val log: Logger = LoggerFactory.getLogger(this.getClass) def main(args: Array[String]) { val sc_conf = new SparkConf() sc_conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") sc_conf.set("spark.scheduler.minRegisteredResourcesRatio", "1.0") val sc: SparkContext = new SparkContext(sc_conf) //Caffe-on-Spark configuration var conf = new Config(sc, args) //training if specified val caffeSpark = new CaffeOnSpark(sc) val source = DataSource.getSource(conf, false) //test val result = caffeSpark.test(source) //save test results into a local file val outputPath = source.conf.outputPath var localFilePath: String = outputPath if (outputPath.startsWith(FSUtils.localfsPrefix)) localFilePath = outputPath.substring(FSUtils.localfsPrefix.length) else localFilePath = System.getProperty("user.dir") + "/test_result.tmp" val out: PrintWriter = new PrintWriter(localFilePath) result.map { case (name, r) => { out.println(name + ": " + r.mkString(",")) writeDebugLog("main", name + ": " + r.mkString(",")) } } out.close writeDebugLog("main", "localFilePath : " + localFilePath); //upload the result file available on HDFS if (!outputPath.startsWith(FSUtils.localfsPrefix)){ FSUtils.CopyFileToHDFS(localFilePath, outputPath) writeDebugLog("main", "copyFile : " + localFilePath + " to " + outputPath +" end !!!"); } } def writeDebugLog(methodName: String, message: String) : Unit = { log.info("Method : " + methodName + ", Message : " + message) } } }

Dec 16 '16 08:12 silence-liu

According to your log, your executor exited for some reason. Can you get a log of your executor? It could be caused by executor resource limitations.

Andy

On Fri, Dec 16, 2016 at 12:52 AM, silence [email protected] wrote:

I used caffeonspark load a external caffemodel to run test ! but encounter difficulties. I run a caffeonspark job on spark-standlone cluster.the following is my configuration and log. please give me some suggestion or solution. thank you !!

run log

16/12/16 16:10:13 INFO spark.SecurityManager: Changing modify acls to: root 16/12/16 16:10:13 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 16/12/16 16:10:14 INFO util.Utils: Successfully started service 'sparkDriver' on port 48835. 16/12/16 16:10:14 INFO slf4j.Slf4jLogger: Slf4jLogger started 16/12/16 16:10:14 INFO Remoting: Starting remoting 16/12/16 16:10:14 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:49460] 16/12/16 16:10:14 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 49460. 16/12/16 16:10:14 INFO spark.SparkEnv: Registering MapOutputTracker 16/12/16 16:10:14 INFO spark.SparkEnv: Registering BlockManagerMaster 16/12/16 16:10:14 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-741618f2-5ef8-484d-99cd-d6cc32a31fce 16/12/16 16:10:14 INFO storage.MemoryStore: MemoryStore started with capacity 511.1 MB 16/12/16 16:10:14 INFO spark.SparkEnv: Registering OutputCommitCoordinator 16/12/16 16:10:14 INFO server.Server: jetty-8.y.z-SNAPSHOT 16/12/16 16:10:14 INFO server.AbstractConnector: Started [email protected]:4040 16/12/16 16:10:14 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 16/12/16 16:10:14 INFO ui.SparkUI: Started SparkUI at http://192.168.15.204:4040 16/12/16 16:10:14 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-2f175bca-17c7-4b89-9874-e2b8a1d0c029/httpd- b0c8c11f-0bb0-446b-ac70-8edf192269dd 16/12/16 16:10:14 INFO spark.HttpServer: Starting HTTP Server 16/12/16 16:10:14 INFO server.Server: jetty-8.y.z-SNAPSHOT 16/12/16 16:10:14 INFO server.AbstractConnector: Started [email protected]:50747 16/12/16 16:10:14 INFO util.Utils: Successfully started service 'HTTP file server' on port 50747. 16/12/16 16:10:14 INFO spark.SparkContext: Added JAR file:/opt/soft/caffeonspark/CaffeOnSpark-master/caffe- grid/target/caffe-grid-0.1-SNAPSHOT-jar-with-dependencies.jar at http://192.168.15.204:50747/jars/caffe-grid-0.1-SNAPSHOT- jar-with-dependencies.jar with timestamp 1481875814627 16/12/16 16:10:14 INFO util.Utils: Copying /opt/soft/caffeonspark/ CaffeOnSpark-master/data/solver.prototxt to /tmp/spark-2f175bca-17c7-4b89- 9874-e2b8a1d0c029/userFiles-1190a2f1-d59f-45aa-873f- c28d7b87b4a6/solver.prototxt 16/12/16 16:10:14 INFO spark.SparkContext: Added file file:/opt/soft/caffeonspark/CaffeOnSpark-master/data/solver.prototxt at http://192.168.15.204:50747/files/solver.prototxt with timestamp 1481875814673 16/12/16 16:10:14 INFO util.Utils: Copying /opt/soft/caffeonspark/ CaffeOnSpark-master/data/train_valx.prototxt to /tmp/spark-2f175bca-17c7-4b89-9874-e2b8a1d0c029/userFiles- 1190a2f1-d59f-45aa-873f-c28d7b87b4a6/train_valx.prototxt 16/12/16 16:10:14 INFO spark.SparkContext: Added file file:/opt/soft/caffeonspark/CaffeOnSpark-master/data/train_valx.prototxt at http://192.168.15.204:50747/files/train_valx.prototxt with timestamp 1481875814677 16/12/16 16:10:14 INFO client.AppClient$ClientEndpoint: Connecting to master spark://bigdata4.gds.com:7077... 16/12/16 16:10:14 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20161216161014-0002 16/12/16 16:10:14 INFO client.AppClient$ClientEndpoint: Executor added: app-20161216161014-0002/0 on worker-20161216160701-192.168.15.204-41550 ( 192.168.15.204:41550) with 1 cores 16/12/16 16:10:14 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 51913. 16/12/16 16:10:14 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20161216161014-0002/0 on hostPort 192.168.15.204:41550 with 1 cores, 1024.0 MB RAM 16/12/16 16:10:14 INFO netty.NettyBlockTransferService: Server created on 51913 16/12/16 16:10:14 INFO storage.BlockManagerMaster: Trying to register BlockManager 16/12/16 16:10:14 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.15.204:51913 with 511.1 MB RAM, BlockManagerId(driver, 192.168.15.204, 51913) 16/12/16 16:10:14 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/0 is now RUNNING 16/12/16 16:10:14 INFO storage.BlockManagerMaster: Registered BlockManager 16/12/16 16:10:15 INFO scheduler.EventLoggingListener: Logging events to hdfs://bigdata4.gds.com:9000/logs/eventLogs/app-20161216161014-0002 16/12/16 16:10:16 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (bigdata4.gds.com:53282) with ID 0 16/12/16 16:10:16 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 1.0 16/12/16 16:10:16 INFO storage.BlockManagerMasterEndpoint: Registering block manager bigdata4.gds.com:55499 with 511.1 MB RAM, BlockManagerId(0, bigdata4.gds.com, 55499) 16/12/16 16:10:17 INFO caffe.DataSource$: Source data layer:1 16/12/16 16:10:17 INFO caffe.LMDB: Batch size:10 16/12/16 16:10:17 WARN caffe.Config: both -test and -features are found, we will do test only (as it is latest), disabling feature mode. 16/12/16 16:10:17 INFO caffe.CaffeOnSpark: Method : test, Message : start Test 16/12/16 16:10:17 INFO caffe.CaffeOnSpark: Method : features2, Message : start features2 ..... 16/12/16 16:10:17 INFO caffe.LmdbRDD: local LMDB path:/opt/soft/caffeonspark/CaffeOnSpark-master/data/val_db 16/12/16 16:10:17 INFO caffe.LmdbRDD: 1 LMDB RDD partitions 16/12/16 16:10:17 INFO spark.SparkContext: Starting job: count at CaffeOnSpark.scala:435 16/12/16 16:10:17 INFO scheduler.DAGScheduler: Got job 0 (count at CaffeOnSpark.scala:435) with 1 output partitions 16/12/16 16:10:17 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (count at CaffeOnSpark.scala:435) 16/12/16 16:10:17 INFO scheduler.DAGScheduler: Parents of final stage: List() 16/12/16 16:10:17 INFO scheduler.DAGScheduler: Missing parents: List() 16/12/16 16:10:17 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at filter at LMDB.scala:36), which has no missing parents 16/12/16 16:10:17 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.3 KB, free 3.3 KB) 16/12/16 16:10:17 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.1 KB, free 5.3 KB) 16/12/16 16:10:17 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.15.204:51913 (size: 2.1 KB, free: 511.1 MB) 16/12/16 16:10:17 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006 16/12/16 16:10:17 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at filter at LMDB.scala:36) 16/12/16 16:10:17 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 16/12/16 16:10:17 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, bigdata4.gds.com, partition 0,PROCESS_LOCAL, 2115 bytes) 16/12/16 16:10:17 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on bigdata4.gds.com:55499 (size: 2.1 KB, free: 511.1 MB) 16/12/16 16:10:18 INFO storage.BlockManagerInfo: Added rdd_1_0 on disk on bigdata4.gds.com:55499 (size: 30.3 MB) 16/12/16 16:10:18 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 527 ms on bigdata4.gds.com (1/1) 16/12/16 16:10:18 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 16/12/16 16:10:18 INFO scheduler.DAGScheduler: ResultStage 0 (count at CaffeOnSpark.scala:435) finished in 0.531 s 16/12/16 16:10:18 INFO scheduler.DAGScheduler: Job 0 finished: count at CaffeOnSpark.scala:435, took 0.701952 s 16/12/16 16:10:18 INFO caffe.CaffeOnSpark: Method : features2, Message : srcDataRDD Count : 247 16/12/16 16:10:18 INFO spark.SparkContext: Starting job: count at CaffeOnSpark.scala:444 16/12/16 16:10:18 INFO scheduler.DAGScheduler: Got job 1 (count at CaffeOnSpark.scala:444) with 1 output partitions 16/12/16 16:10:18 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (count at CaffeOnSpark.scala:444) 16/12/16 16:10:18 INFO scheduler.DAGScheduler: Parents of final stage: List() 16/12/16 16:10:18 INFO scheduler.DAGScheduler: Missing parents: List() 16/12/16 16:10:18 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[3] at map at CaffeOnSpark.scala:439), which has no missing parents 16/12/16 16:10:18 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.0 KB, free 9.3 KB) 16/12/16 16:10:18 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.5 KB, free 11.8 KB) 16/12/16 16:10:18 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.15.204:51913 (size: 2.5 KB, free: 511.1 MB) 16/12/16 16:10:18 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 16/12/16 16:10:18 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[3] at map at CaffeOnSpark.scala:439) 16/12/16 16:10:18 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 1 tasks 16/12/16 16:10:18 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, bigdata4.gds.com, partition 0,PROCESS_LOCAL, 2180 bytes) 16/12/16 16:10:18 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on bigdata4.gds.com:55499 (size: 2.5 KB, free: 511.1 MB) 16/12/16 16:10:19 ERROR scheduler.TaskSchedulerImpl: Lost executor 0 on bigdata4.gds.com: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/12/16 16:10:19 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, bigdata4.gds.com): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/12/16 16:10:19 INFO scheduler.DAGScheduler: Executor lost: 0 (epoch 0) 16/12/16 16:10:19 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 0 from BlockManagerMaster. 16/12/16 16:10:19 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(0, bigdata4.gds.com, 55499) 16/12/16 16:10:19 INFO storage.BlockManagerMaster: Removed 0 successfully in removeExecutor 16/12/16 16:10:19 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/0 is now EXITED (Command exited with code 134) 16/12/16 16:10:19 INFO cluster.SparkDeploySchedulerBackend: Executor app-20161216161014-0002/0 removed: Command exited with code 134 16/12/16 16:10:19 INFO cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 0 16/12/16 16:10:19 INFO client.AppClient$ClientEndpoint: Executor added: app-20161216161014-0002/1 on worker-20161216160701-192.168.15.204-41550 ( 192.168.15.204:41550) with 1 cores 16/12/16 16:10:19 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20161216161014-0002/1 on hostPort 192.168.15.204:41550 with 1 cores, 1024.0 MB RAM 16/12/16 16:10:19 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/1 is now RUNNING 16/12/16 16:10:20 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (bigdata4.gds.com:53289) with ID 1 16/12/16 16:10:20 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 1.0 (TID 2, bigdata4.gds.com, partition 0,PROCESS_LOCAL, 2180 bytes) 16/12/16 16:10:20 INFO storage.BlockManagerMasterEndpoint: Registering block manager bigdata4.gds.com:50688 with 511.1 MB RAM, BlockManagerId(1, bigdata4.gds.com, 50688) 16/12/16 16:10:20 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on bigdata4.gds.com:50688 (size: 2.5 KB, free: 511.1 MB) 16/12/16 16:10:21 ERROR scheduler.TaskSchedulerImpl: Lost executor 1 on bigdata4.gds.com: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/12/16 16:10:21 WARN scheduler.TaskSetManager: Lost task 0.1 in stage 1.0 (TID 2, bigdata4.gds.com): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/12/16 16:10:21 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 1) 16/12/16 16:10:21 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster. 16/12/16 16:10:21 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, bigdata4.gds.com, 50688) 16/12/16 16:10:21 INFO storage.BlockManagerMaster: Removed 1 successfully in removeExecutor 16/12/16 16:10:21 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/1 is now EXITED (Command exited with code 134) 16/12/16 16:10:21 INFO cluster.SparkDeploySchedulerBackend: Executor app-20161216161014-0002/1 removed: Command exited with code 134 16/12/16 16:10:21 INFO cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 1 16/12/16 16:10:21 INFO client.AppClient$ClientEndpoint: Executor added: app-20161216161014-0002/2 on worker-20161216160701-192.168.15.204-41550 ( 192.168.15.204:41550) with 1 cores 16/12/16 16:10:21 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20161216161014-0002/2 on hostPort 192.168.15.204:41550 with 1 cores, 1024.0 MB RAM 16/12/16 16:10:21 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/2 is now RUNNING 16/12/16 16:10:22 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (bigdata4.gds.com:53297) with ID 2 16/12/16 16:10:22 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 1.0 (TID 3, bigdata4.gds.com, partition 0,PROCESS_LOCAL, 2180 bytes) 16/12/16 16:10:22 INFO storage.BlockManagerMasterEndpoint: Registering block manager bigdata4.gds.com:47828 with 511.1 MB RAM, BlockManagerId(2, bigdata4.gds.com, 47828) 16/12/16 16:10:23 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on bigdata4.gds.com:47828 (size: 2.5 KB, free: 511.1 MB) 16/12/16 16:10:24 ERROR scheduler.TaskSchedulerImpl: Lost executor 2 on bigdata4.gds.com: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/12/16 16:10:24 WARN scheduler.TaskSetManager: Lost task 0.2 in stage 1.0 (TID 3, bigdata4.gds.com): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/12/16 16:10:24 INFO scheduler.DAGScheduler: Executor lost: 2 (epoch 2) 16/12/16 16:10:24 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster. 16/12/16 16:10:24 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, bigdata4.gds.com, 47828) 16/12/16 16:10:24 INFO storage.BlockManagerMaster: Removed 2 successfully in removeExecutor 16/12/16 16:10:24 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/2 is now EXITED (Command exited with code 134) 16/12/16 16:10:24 INFO cluster.SparkDeploySchedulerBackend: Executor app-20161216161014-0002/2 removed: Command exited with code 134 16/12/16 16:10:24 INFO cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 2 16/12/16 16:10:24 INFO client.AppClient$ClientEndpoint: Executor added: app-20161216161014-0002/3 on worker-20161216160701-192.168.15.204-41550 ( 192.168.15.204:41550) with 1 cores 16/12/16 16:10:24 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20161216161014-0002/3 on hostPort 192.168.15.204:41550 with 1 cores, 1024.0 MB RAM 16/12/16 16:10:24 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/3 is now RUNNING 16/12/16 16:10:25 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (bigdata4.gds.com:53304) with ID 3 16/12/16 16:10:25 INFO scheduler.TaskSetManager: Starting task 0.3 in stage 1.0 (TID 4, bigdata4.gds.com, partition 0,PROCESS_LOCAL, 2180 bytes) 16/12/16 16:10:25 INFO storage.BlockManagerMasterEndpoint: Registering block manager bigdata4.gds.com:38664 with 511.1 MB RAM, BlockManagerId(3, bigdata4.gds.com, 38664) 16/12/16 16:10:25 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.15.204:51913 in memory (size: 2.1 KB, free: 511.1 MB) 16/12/16 16:10:25 INFO spark.ContextCleaner: Cleaned accumulator 1 16/12/16 16:10:25 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on bigdata4.gds.com:38664 (size: 2.5 KB, free: 511.1 MB) 16/12/16 16:10:26 ERROR scheduler.TaskSchedulerImpl: Lost executor 3 on bigdata4.gds.com: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/12/16 16:10:26 WARN scheduler.TaskSetManager: Lost task 0.3 in stage 1.0 (TID 4, bigdata4.gds.com): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. 16/12/16 16:10:26 ERROR scheduler.TaskSetManager: Task 0 in stage 1.0 failed 4 times; aborting job 16/12/16 16:10:26 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 16/12/16 16:10:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/3 is now EXITED (Command exited with code 134) 16/12/16 16:10:26 INFO cluster.SparkDeploySchedulerBackend: Executor app-20161216161014-0002/3 removed: Command exited with code 134 16/12/16 16:10:26 INFO cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 3 16/12/16 16:10:26 INFO scheduler.TaskSchedulerImpl: Cancelling stage 1 16/12/16 16:10:26 INFO client.AppClient$ClientEndpoint: Executor added: app-20161216161014-0002/4 on worker-20161216160701-192.168.15.204-41550 ( 192.168.15.204:41550) with 1 cores 16/12/16 16:10:26 INFO scheduler.DAGScheduler: ResultStage 1 (count at CaffeOnSpark.scala:444) failed in 8.594 s 16/12/16 16:10:26 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20161216161014-0002/4 on hostPort 192.168.15.204:41550 with 1 cores, 1024.0 MB RAM 16/12/16 16:10:26 INFO scheduler.DAGScheduler: Executor lost: 3 (epoch 3) 16/12/16 16:10:26 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 3 from BlockManagerMaster. 16/12/16 16:10:26 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(3, bigdata4.gds.com, 38664) 16/12/16 16:10:26 INFO scheduler.DAGScheduler: Job 1 failed: count at CaffeOnSpark.scala:444, took 8.607960 s Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, bigdata4.gds.com): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$ scheduler$DAGScheduler$$failJobAndIndependentStages( DAGScheduler.scala:1431) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( DAGScheduler.scala:1419) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( DAGScheduler.scala:1418) at scala.collection.mutable.ResizableArray$class.foreach( ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage( DAGScheduler.scala:1418) at org.apache.spark.scheduler.DAGScheduler$$anonfun$ handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGScheduler$$anonfun$ handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop. doOnReceive(DAGScheduler.scala:1640) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop. onReceive(DAGScheduler.scala:1599) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop. onReceive(DAGScheduler.scala:1588) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) at org.apache.spark.rdd.RDD.count(RDD.scala:1143) at com.yahoo.ml.caffe.CaffeOnSpark.features2(CaffeOnSpark.scala:444) at com.yahoo.ml.caffe.CaffeOnSpark.test(CaffeOnSpark.scala:382) at com.yahoo.ml.caffe.CaffeOnSpark$.main(CaffeOnSpark.scala:38) at com.yahoo.ml.caffe.CaffeOnSpark.main(CaffeOnSpark.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$ deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 16/12/16 16:10:26 INFO spark.SparkContext: Invoking stop() from shutdown hook 16/12/16 16:10:26 INFO storage.BlockManagerMaster: Removed 3 successfully in removeExecutor 16/12/16 16:10:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161216161014-0002/4 is now RUNNING 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static/sql,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/SQL/execution/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/SQL/execution,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/SQL/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/SQL,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null} 16/12/16 16:10:26 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null} 16/12/16 16:10:26 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.15.204:4040 16/12/16 16:10:26 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors 16/12/16 16:10:26 INFO cluster.SparkDeploySchedulerBackend: Asking each executor to shut down 16/12/16 16:10:26 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 16/12/16 16:10:26 INFO storage.MemoryStore: MemoryStore cleared 16/12/16 16:10:26 INFO storage.BlockManager: BlockManager stopped 16/12/16 16:10:26 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 16/12/16 16:10:26 INFO scheduler.OutputCommitCoordinator$ OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 16/12/16 16:10:26 INFO spark.SparkContext: Successfully stopped SparkContext 16/12/16 16:10:26 INFO util.ShutdownHookManager: Shutdown hook called 16/12/16 16:10:26 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2f175bca-17c7-4b89-9874-e2b8a1d0c029/httpd- b0c8c11f-0bb0-446b-ac70-8edf192269dd 16/12/16 16:10:26 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2f175bca-17c7-4b89-9874-e2b8a1d0c029 16/12/16 16:10:26 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.

solver.prototxt

test_iter: 4 test_interval: 12 base_lr: 0.001 display: 1 max_iter: 360 lr_policy: "exp" gamma: 0.985852887007 momentum: 0.9 weight_decay: 1e-05 snapshot: 12 snapshot_prefix: "snapshot" solver_mode: CPU random_seed: 7 net: "train_valx.prototxt" solver_type: SGD

train_valx.prototxt

layer { name: "train-data" type: "MemoryData" top: "data" top: "label" include { phase: TRAIN } source_class: "com.yahoo.ml.caffe.LMDB" transform_param { mirror: true crop_size: 224 mean_file: "file:///opt/soft/caffeonspark/CaffeOnSpark- master/data/mean.binaryproto" } memory_data_param { source: "file:///opt/soft/caffeonspark/CaffeOnSpark-master/data/train_db" batch_size: 7 channels: 1 height: 256 width: 256 share_in_parallel: false } } layer { name: "val-data" type: "MemoryData" top: "data" top: "label" include { phase: TEST } transform_param { mirror: false crop_size: 224 mean_file: "file:///opt/soft/caffeonspark/CaffeOnSpark- master/data/mean.binaryproto" } source_class: "com.yahoo.ml.caffe.LMDB" memory_data_param { source: "file:///opt/soft/caffeonspark/CaffeOnSpark-master/data/val_db" batch_size: 10 channels: 1 height: 256 width: 256 share_in_parallel: false } } layer { name: "conv1/7x7_s2" type: "Convolution" bottom: "data" top: "conv1/7x7_s2" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 pad: 3 kernel_size: 7 stride: 2 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "conv1/relu_7x7" type: "ReLU" bottom: "conv1/7x7_s2" top: "conv1/7x7_s2" } layer { name: "pool1/3x3_s2" type: "Pooling" bottom: "conv1/7x7_s2" top: "pool1/3x3_s2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "pool1/norm1" type: "LRN" bottom: "pool1/3x3_s2" top: "pool1/norm1" lrn_param { local_size: 5 alpha: 9.99999974738e-05 beta: 0.75 } } layer { name: "conv2/3x3_reduce" type: "Convolution" bottom: "pool1/norm1" top: "conv2/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "conv2/relu_3x3_reduce" type: "ReLU" bottom: "conv2/3x3_reduce" top: "conv2/3x3_reduce" } layer { name: "conv2/3x3" type: "Convolution" bottom: "conv2/3x3_reduce" top: "conv2/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 192 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "conv2/relu_3x3" type: "ReLU" bottom: "conv2/3x3" top: "conv2/3x3" } layer { name: "conv2/norm2" type: "LRN" bottom: "conv2/3x3" top: "conv2/norm2" lrn_param { local_size: 5 alpha: 9.99999974738e-05 beta: 0.75 } } layer { name: "pool2/3x3_s2" type: "Pooling" bottom: "conv2/norm2" top: "pool2/3x3_s2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "inception_3a/1x1" type: "Convolution" bottom: "pool2/3x3_s2" top: "inception_3a/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3a/relu_1x1" type: "ReLU" bottom: "inception_3a/1x1" top: "inception_3a/1x1" } layer { name: "inception_3a/3x3_reduce" type: "Convolution" bottom: "pool2/3x3_s2" top: "inception_3a/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 96 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3a/relu_3x3_reduce" type: "ReLU" bottom: "inception_3a/3x3_reduce" top: "inception_3a/3x3_reduce" } layer { name: "inception_3a/3x3" type: "Convolution" bottom: "inception_3a/3x3_reduce" top: "inception_3a/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3a/relu_3x3" type: "ReLU" bottom: "inception_3a/3x3" top: "inception_3a/3x3" } layer { name: "inception_3a/5x5_reduce" type: "Convolution" bottom: "pool2/3x3_s2" top: "inception_3a/5x5_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 16 kernel_size: 1 weight_filler { type: "xavier" std: 0.20000000298 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3a/relu_5x5_reduce" type: "ReLU" bottom: "inception_3a/5x5_reduce" top: "inception_3a/5x5_reduce" } layer { name: "inception_3a/5x5" type: "Convolution" bottom: "inception_3a/5x5_reduce" top: "inception_3a/5x5" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 32 pad: 2 kernel_size: 5 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3a/relu_5x5" type: "ReLU" bottom: "inception_3a/5x5" top: "inception_3a/5x5" } layer { name: "inception_3a/pool" type: "Pooling" bottom: "pool2/3x3_s2" top: "inception_3a/pool" pooling_param { pool: MAX kernel_size: 3 stride: 1 pad: 1 } } layer { name: "inception_3a/pool_proj" type: "Convolution" bottom: "inception_3a/pool" top: "inception_3a/pool_proj" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 32 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3a/relu_pool_proj" type: "ReLU" bottom: "inception_3a/pool_proj" top: "inception_3a/pool_proj" } layer { name: "inception_3a/output" type: "Concat" bottom: "inception_3a/1x1" bottom: "inception_3a/3x3" bottom: "inception_3a/5x5" bottom: "inception_3a/pool_proj" top: "inception_3a/output" } layer { name: "inception_3b/1x1" type: "Convolution" bottom: "inception_3a/output" top: "inception_3b/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3b/relu_1x1" type: "ReLU" bottom: "inception_3b/1x1" top: "inception_3b/1x1" } layer { name: "inception_3b/3x3_reduce" type: "Convolution" bottom: "inception_3a/output" top: "inception_3b/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3b/relu_3x3_reduce" type: "ReLU" bottom: "inception_3b/3x3_reduce" top: "inception_3b/3x3_reduce" } layer { name: "inception_3b/3x3" type: "Convolution" bottom: "inception_3b/3x3_reduce" top: "inception_3b/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 192 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3b/relu_3x3" type: "ReLU" bottom: "inception_3b/3x3" top: "inception_3b/3x3" } layer { name: "inception_3b/5x5_reduce" type: "Convolution" bottom: "inception_3a/output" top: "inception_3b/5x5_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 32 kernel_size: 1 weight_filler { type: "xavier" std: 0.20000000298 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3b/relu_5x5_reduce" type: "ReLU" bottom: "inception_3b/5x5_reduce" top: "inception_3b/5x5_reduce" } layer { name: "inception_3b/5x5" type: "Convolution" bottom: "inception_3b/5x5_reduce" top: "inception_3b/5x5" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 96 pad: 2 kernel_size: 5 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3b/relu_5x5" type: "ReLU" bottom: "inception_3b/5x5" top: "inception_3b/5x5" } layer { name: "inception_3b/pool" type: "Pooling" bottom: "inception_3a/output" top: "inception_3b/pool" pooling_param { pool: MAX kernel_size: 3 stride: 1 pad: 1 } } layer { name: "inception_3b/pool_proj" type: "Convolution" bottom: "inception_3b/pool" top: "inception_3b/pool_proj" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_3b/relu_pool_proj" type: "ReLU" bottom: "inception_3b/pool_proj" top: "inception_3b/pool_proj" } layer { name: "inception_3b/output" type: "Concat" bottom: "inception_3b/1x1" bottom: "inception_3b/3x3" bottom: "inception_3b/5x5" bottom: "inception_3b/pool_proj" top: "inception_3b/output" } layer { name: "pool3/3x3_s2" type: "Pooling" bottom: "inception_3b/output" top: "pool3/3x3_s2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "inception_4a/1x1" type: "Convolution" bottom: "pool3/3x3_s2" top: "inception_4a/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 192 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4a/relu_1x1" type: "ReLU" bottom: "inception_4a/1x1" top: "inception_4a/1x1" } layer { name: "inception_4a/3x3_reduce" type: "Convolution" bottom: "pool3/3x3_s2" top: "inception_4a/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 96 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4a/relu_3x3_reduce" type: "ReLU" bottom: "inception_4a/3x3_reduce" top: "inception_4a/3x3_reduce" } layer { name: "inception_4a/3x3" type: "Convolution" bottom: "inception_4a/3x3_reduce" top: "inception_4a/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 208 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4a/relu_3x3" type: "ReLU" bottom: "inception_4a/3x3" top: "inception_4a/3x3" } layer { name: "inception_4a/5x5_reduce" type: "Convolution" bottom: "pool3/3x3_s2" top: "inception_4a/5x5_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 16 kernel_size: 1 weight_filler { type: "xavier" std: 0.20000000298 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4a/relu_5x5_reduce" type: "ReLU" bottom: "inception_4a/5x5_reduce" top: "inception_4a/5x5_reduce" } layer { name: "inception_4a/5x5" type: "Convolution" bottom: "inception_4a/5x5_reduce" top: "inception_4a/5x5" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 48 pad: 2 kernel_size: 5 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4a/relu_5x5" type: "ReLU" bottom: "inception_4a/5x5" top: "inception_4a/5x5" } layer { name: "inception_4a/pool" type: "Pooling" bottom: "pool3/3x3_s2" top: "inception_4a/pool" pooling_param { pool: MAX kernel_size: 3 stride: 1 pad: 1 } } layer { name: "inception_4a/pool_proj" type: "Convolution" bottom: "inception_4a/pool" top: "inception_4a/pool_proj" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4a/relu_pool_proj" type: "ReLU" bottom: "inception_4a/pool_proj" top: "inception_4a/pool_proj" } layer { name: "inception_4a/output" type: "Concat" bottom: "inception_4a/1x1" bottom: "inception_4a/3x3" bottom: "inception_4a/5x5" bottom: "inception_4a/pool_proj" top: "inception_4a/output" } layer { name: "loss1/ave_pool" type: "Pooling" bottom: "inception_4a/output" top: "loss1/ave_pool" pooling_param { pool: AVE kernel_size: 5 stride: 3 } } layer { name: "loss1/conv" type: "Convolution" bottom: "loss1/ave_pool" top: "loss1/conv" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" std: 0.0799999982119 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "loss1/relu_conv" type: "ReLU" bottom: "loss1/conv" top: "loss1/conv" } layer { name: "loss1/fc" type: "InnerProduct" bottom: "loss1/conv" top: "loss1/fc" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } inner_product_param { num_output: 1024 weight_filler { type: "xavier" std: 0.019999999553 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "loss1/relu_fc" type: "ReLU" bottom: "loss1/fc" top: "loss1/fc" } layer { name: "loss1/drop_fc" type: "Dropout" bottom: "loss1/fc" top: "loss1/fc" dropout_param { dropout_ratio: 0.699999988079 } } layer { name: "loss1/classifier" type: "InnerProduct" bottom: "loss1/fc" top: "loss1/classifier" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } inner_product_param { num_output: 2 weight_filler { type: "xavier" std: 0.0009765625 } bias_filler { type: "constant" value: 0.0 } } } layer { name: "loss1/loss" type: "SoftmaxWithLoss" bottom: "loss1/classifier" bottom: "label" top: "loss1/loss" loss_weight: 0.300000011921 } layer { name: "loss1/top-1" type: "Accuracy" bottom: "loss1/classifier" bottom: "label" top: "loss1/accuracy" include { phase: TEST } } layer { name: "inception_4b/1x1" type: "Convolution" bottom: "inception_4a/output" top: "inception_4b/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 160 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4b/relu_1x1" type: "ReLU" bottom: "inception_4b/1x1" top: "inception_4b/1x1" } layer { name: "inception_4b/3x3_reduce" type: "Convolution" bottom: "inception_4a/output" top: "inception_4b/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 112 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4b/relu_3x3_reduce" type: "ReLU" bottom: "inception_4b/3x3_reduce" top: "inception_4b/3x3_reduce" } layer { name: "inception_4b/3x3" type: "Convolution" bottom: "inception_4b/3x3_reduce" top: "inception_4b/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 224 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4b/relu_3x3" type: "ReLU" bottom: "inception_4b/3x3" top: "inception_4b/3x3" } layer { name: "inception_4b/5x5_reduce" type: "Convolution" bottom: "inception_4a/output" top: "inception_4b/5x5_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 24 kernel_size: 1 weight_filler { type: "xavier" std: 0.20000000298 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4b/relu_5x5_reduce" type: "ReLU" bottom: "inception_4b/5x5_reduce" top: "inception_4b/5x5_reduce" } layer { name: "inception_4b/5x5" type: "Convolution" bottom: "inception_4b/5x5_reduce" top: "inception_4b/5x5" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 pad: 2 kernel_size: 5 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4b/relu_5x5" type: "ReLU" bottom: "inception_4b/5x5" top: "inception_4b/5x5" } layer { name: "inception_4b/pool" type: "Pooling" bottom: "inception_4a/output" top: "inception_4b/pool" pooling_param { pool: MAX kernel_size: 3 stride: 1 pad: 1 } } layer { name: "inception_4b/pool_proj" type: "Convolution" bottom: "inception_4b/pool" top: "inception_4b/pool_proj" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4b/relu_pool_proj" type: "ReLU" bottom: "inception_4b/pool_proj" top: "inception_4b/pool_proj" } layer { name: "inception_4b/output" type: "Concat" bottom: "inception_4b/1x1" bottom: "inception_4b/3x3" bottom: "inception_4b/5x5" bottom: "inception_4b/pool_proj" top: "inception_4b/output" } layer { name: "inception_4c/1x1" type: "Convolution" bottom: "inception_4b/output" top: "inception_4c/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4c/relu_1x1" type: "ReLU" bottom: "inception_4c/1x1" top: "inception_4c/1x1" } layer { name: "inception_4c/3x3_reduce" type: "Convolution" bottom: "inception_4b/output" top: "inception_4c/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4c/relu_3x3_reduce" type: "ReLU" bottom: "inception_4c/3x3_reduce" top: "inception_4c/3x3_reduce" } layer { name: "inception_4c/3x3" type: "Convolution" bottom: "inception_4c/3x3_reduce" top: "inception_4c/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4c/relu_3x3" type: "ReLU" bottom: "inception_4c/3x3" top: "inception_4c/3x3" } layer { name: "inception_4c/5x5_reduce" type: "Convolution" bottom: "inception_4b/output" top: "inception_4c/5x5_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 24 kernel_size: 1 weight_filler { type: "xavier" std: 0.20000000298 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4c/relu_5x5_reduce" type: "ReLU" bottom: "inception_4c/5x5_reduce" top: "inception_4c/5x5_reduce" } layer { name: "inception_4c/5x5" type: "Convolution" bottom: "inception_4c/5x5_reduce" top: "inception_4c/5x5" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 pad: 2 kernel_size: 5 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4c/relu_5x5" type: "ReLU" bottom: "inception_4c/5x5" top: "inception_4c/5x5" } layer { name: "inception_4c/pool" type: "Pooling" bottom: "inception_4b/output" top: "inception_4c/pool" pooling_param { pool: MAX kernel_size: 3 stride: 1 pad: 1 } } layer { name: "inception_4c/pool_proj" type: "Convolution" bottom: "inception_4c/pool" top: "inception_4c/pool_proj" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4c/relu_pool_proj" type: "ReLU" bottom: "inception_4c/pool_proj" top: "inception_4c/pool_proj" } layer { name: "inception_4c/output" type: "Concat" bottom: "inception_4c/1x1" bottom: "inception_4c/3x3" bottom: "inception_4c/5x5" bottom: "inception_4c/pool_proj" top: "inception_4c/output" } layer { name: "inception_4d/1x1" type: "Convolution" bottom: "inception_4c/output" top: "inception_4d/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 112 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4d/relu_1x1" type: "ReLU" bottom: "inception_4d/1x1" top: "inception_4d/1x1" } layer { name: "inception_4d/3x3_reduce" type: "Convolution" bottom: "inception_4c/output" top: "inception_4d/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 144 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4d/relu_3x3_reduce" type: "ReLU" bottom: "inception_4d/3x3_reduce" top: "inception_4d/3x3_reduce" } layer { name: "inception_4d/3x3" type: "Convolution" bottom: "inception_4d/3x3_reduce" top: "inception_4d/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 288 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4d/relu_3x3" type: "ReLU" bottom: "inception_4d/3x3" top: "inception_4d/3x3" } layer { name: "inception_4d/5x5_reduce" type: "Convolution" bottom: "inception_4c/output" top: "inception_4d/5x5_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 32 kernel_size: 1 weight_filler { type: "xavier" std: 0.20000000298 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4d/relu_5x5_reduce" type: "ReLU" bottom: "inception_4d/5x5_reduce" top: "inception_4d/5x5_reduce" } layer { name: "inception_4d/5x5" type: "Convolution" bottom: "inception_4d/5x5_reduce" top: "inception_4d/5x5" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 pad: 2 kernel_size: 5 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4d/relu_5x5" type: "ReLU" bottom: "inception_4d/5x5" top: "inception_4d/5x5" } layer { name: "inception_4d/pool" type: "Pooling" bottom: "inception_4c/output" top: "inception_4d/pool" pooling_param { pool: MAX kernel_size: 3 stride: 1 pad: 1 } } layer { name: "inception_4d/pool_proj" type: "Convolution" bottom: "inception_4d/pool" top: "inception_4d/pool_proj" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 64 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4d/relu_pool_proj" type: "ReLU" bottom: "inception_4d/pool_proj" top: "inception_4d/pool_proj" } layer { name: "inception_4d/output" type: "Concat" bottom: "inception_4d/1x1" bottom: "inception_4d/3x3" bottom: "inception_4d/5x5" bottom: "inception_4d/pool_proj" top: "inception_4d/output" } layer { name: "loss2/ave_pool" type: "Pooling" bottom: "inception_4d/output" top: "loss2/ave_pool" pooling_param { pool: AVE kernel_size: 5 stride: 3 } } layer { name: "loss2/conv" type: "Convolution" bottom: "loss2/ave_pool" top: "loss2/conv" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" std: 0.0799999982119 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "loss2/relu_conv" type: "ReLU" bottom: "loss2/conv" top: "loss2/conv" } layer { name: "loss2/fc" type: "InnerProduct" bottom: "loss2/conv" top: "loss2/fc" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } inner_product_param { num_output: 1024 weight_filler { type: "xavier" std: 0.019999999553 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "loss2/relu_fc" type: "ReLU" bottom: "loss2/fc" top: "loss2/fc" } layer { name: "loss2/drop_fc" type: "Dropout" bottom: "loss2/fc" top: "loss2/fc" dropout_param { dropout_ratio: 0.699999988079 } } layer { name: "loss2/classifier" type: "InnerProduct" bottom: "loss2/fc" top: "loss2/classifier" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } inner_product_param { num_output: 2 weight_filler { type: "xavier" std: 0.0009765625 } bias_filler { type: "constant" value: 0.0 } } } layer { name: "loss2/loss" type: "SoftmaxWithLoss" bottom: "loss2/classifier" bottom: "label" top: "loss2/loss" loss_weight: 0.300000011921 } layer { name: "loss2/top-1" type: "Accuracy" bottom: "loss2/classifier" bottom: "label" top: "loss2/accuracy" include { phase: TEST } } layer { name: "inception_4e/1x1" type: "Convolution" bottom: "inception_4d/output" top: "inception_4e/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 256 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4e/relu_1x1" type: "ReLU" bottom: "inception_4e/1x1" top: "inception_4e/1x1" } layer { name: "inception_4e/3x3_reduce" type: "Convolution" bottom: "inception_4d/output" top: "inception_4e/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 160 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4e/relu_3x3_reduce" type: "ReLU" bottom: "inception_4e/3x3_reduce" top: "inception_4e/3x3_reduce" } layer { name: "inception_4e/3x3" type: "Convolution" bottom: "inception_4e/3x3_reduce" top: "inception_4e/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 320 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4e/relu_3x3" type: "ReLU" bottom: "inception_4e/3x3" top: "inception_4e/3x3" } layer { name: "inception_4e/5x5_reduce" type: "Convolution" bottom: "inception_4d/output" top: "inception_4e/5x5_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 32 kernel_size: 1 weight_filler { type: "xavier" std: 0.20000000298 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4e/relu_5x5_reduce" type: "ReLU" bottom: "inception_4e/5x5_reduce" top: "inception_4e/5x5_reduce" } layer { name: "inception_4e/5x5" type: "Convolution" bottom: "inception_4e/5x5_reduce" top: "inception_4e/5x5" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 pad: 2 kernel_size: 5 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4e/relu_5x5" type: "ReLU" bottom: "inception_4e/5x5" top: "inception_4e/5x5" } layer { name: "inception_4e/pool" type: "Pooling" bottom: "inception_4d/output" top: "inception_4e/pool" pooling_param { pool: MAX kernel_size: 3 stride: 1 pad: 1 } } layer { name: "inception_4e/pool_proj" type: "Convolution" bottom: "inception_4e/pool" top: "inception_4e/pool_proj" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_4e/relu_pool_proj" type: "ReLU" bottom: "inception_4e/pool_proj" top: "inception_4e/pool_proj" } layer { name: "inception_4e/output" type: "Concat" bottom: "inception_4e/1x1" bottom: "inception_4e/3x3" bottom: "inception_4e/5x5" bottom: "inception_4e/pool_proj" top: "inception_4e/output" } layer { name: "pool4/3x3_s2" type: "Pooling" bottom: "inception_4e/output" top: "pool4/3x3_s2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "inception_5a/1x1" type: "Convolution" bottom: "pool4/3x3_s2" top: "inception_5a/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 256 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5a/relu_1x1" type: "ReLU" bottom: "inception_5a/1x1" top: "inception_5a/1x1" } layer { name: "inception_5a/3x3_reduce" type: "Convolution" bottom: "pool4/3x3_s2" top: "inception_5a/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 160 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5a/relu_3x3_reduce" type: "ReLU" bottom: "inception_5a/3x3_reduce" top: "inception_5a/3x3_reduce" } layer { name: "inception_5a/3x3" type: "Convolution" bottom: "inception_5a/3x3_reduce" top: "inception_5a/3x3" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 320 pad: 1 kernel_size: 3 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5a/relu_3x3" type: "ReLU" bottom: "inception_5a/3x3" top: "inception_5a/3x3" } layer { name: "inception_5a/5x5_reduce" type: "Convolution" bottom: "pool4/3x3_s2" top: "inception_5a/5x5_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 32 kernel_size: 1 weight_filler { type: "xavier" std: 0.20000000298 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5a/relu_5x5_reduce" type: "ReLU" bottom: "inception_5a/5x5_reduce" top: "inception_5a/5x5_reduce" } layer { name: "inception_5a/5x5" type: "Convolution" bottom: "inception_5a/5x5_reduce" top: "inception_5a/5x5" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 pad: 2 kernel_size: 5 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5a/relu_5x5" type: "ReLU" bottom: "inception_5a/5x5" top: "inception_5a/5x5" } layer { name: "inception_5a/pool" type: "Pooling" bottom: "pool4/3x3_s2" top: "inception_5a/pool" pooling_param { pool: MAX kernel_size: 3 stride: 1 pad: 1 } } layer { name: "inception_5a/pool_proj" type: "Convolution" bottom: "inception_5a/pool" top: "inception_5a/pool_proj" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 128 kernel_size: 1 weight_filler { type: "xavier" std: 0.10000000149 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5a/relu_pool_proj" type: "ReLU" bottom: "inception_5a/pool_proj" top: "inception_5a/pool_proj" } layer { name: "inception_5a/output" type: "Concat" bottom: "inception_5a/1x1" bottom: "inception_5a/3x3" bottom: "inception_5a/5x5" bottom: "inception_5a/pool_proj" top: "inception_5a/output" } layer { name: "inception_5b/1x1" type: "Convolution" bottom: "inception_5a/output" top: "inception_5b/1x1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 384 kernel_size: 1 weight_filler { type: "xavier" std: 0.0299999993294 } bias_filler { type: "constant" value: 0.20000000298 } } } layer { name: "inception_5b/relu_1x1" type: "ReLU" bottom: "inception_5b/1x1" top: "inception_5b/1x1" } layer { name: "inception_5b/3x3_reduce" type: "Convolution" bottom: "inception_5a/output" top: "inception_5b/3x3_reduce" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } convolution_param { num_output: 192 kernel_size: 1 weight_filler { type: "xavier" std: 0.0900000035763 } bias_filler {

Dec 17 '16 04:12 anfeng

CaffeOnSpark CaffeOnSpark copied to clipboard

Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues.

run log

solver.prototxt

train_valx.prototxt

spark-conf

spark-commit

Code

I used caffeonspark load a external caffemodel to run test ! but encounter difficulties. I run a caffeonspark job on spark-standlone cluster.the following is my configuration and log. please give me some suggestion or solution. thank you !!

test_iter: 4 test_interval: 12 base_lr: 0.001 display: 1 max_iter: 360 lr_policy: "exp" gamma: 0.985852887007 momentum: 0.9 weight_decay: 1e-05 snapshot: 12 snapshot_prefix: "snapshot" solver_mode: CPU random_seed: 7 net: "train_valx.prototxt" solver_type: SGD

CaffeOnSpark
CaffeOnSpark copied to clipboard