MobileNet-Caffe
                                
                                 MobileNet-Caffe copied to clipboard
                                
                                    MobileNet-Caffe copied to clipboard
                            
                            
                            
                        Training does not converge
Hi,
When I change the group convolution to depthwise convolution, there is no convergence of the network training. What is the reason for this? Thx! modified: layer { name: "conv2_1/dpwise" type: "DepthwiseConvolution" bottom: "conv2_1/expand/bn" top: "conv2_1/dwise" param { lr_mult: 1 decay_mult: 1 } convolution_param { num_output: 32 bias_term: false pad: 1 kernel_size: 3
weight_filler {
  type: "msra"
}
} } Training.... I0309 13:11:24.893823 18939 solver.cpp:272] Solving MOBILENET_V2 I0309 13:11:24.893851 18939 solver.cpp:273] Learning Rate Policy: poly I0309 13:11:25.859395 18939 solver.cpp:218] Iteration 0 (0 iter/s, 0.915814s/20 iters), loss = 7.08397 I0309 13:11:25.859428 18939 solver.cpp:237] Train net output #0: loss = 7.08397 (* 1 = 7.08397 loss) I0309 13:11:25.859437 18939 sgd_solver.cpp:105] Iteration 0, lr = 0.045 I0309 13:11:36.243708 18939 solver.cpp:218] Iteration 20 (1.92605 iter/s, 10.3839s/20 iters), loss = 6.98624 I0309 13:11:36.243757 18939 solver.cpp:237] Train net output #0: loss = 6.98624 (* 1 = 6.98624 loss) I0309 13:11:36.243767 18939 sgd_solver.cpp:105] Iteration 20, lr = 0.0449991 I0309 13:11:46.530542 18939 solver.cpp:218] Iteration 40 (1.94431 iter/s, 10.2864s/20 iters), loss = 6.92067 I0309 13:11:46.530589 18939 solver.cpp:237] Train net output #0: loss = 6.92067 (* 1 = 6.92067 loss) I0309 13:11:46.530599 18939 sgd_solver.cpp:105] Iteration 40, lr = 0.0449982 I0309 13:11:56.811890 18939 solver.cpp:218] Iteration 60 (1.94534 iter/s, 10.281s/20 iters), loss = 6.92625 I0309 13:11:56.812000 18939 solver.cpp:237] Train net output #0: loss = 6.92625 (* 1 = 6.92625 loss) I0309 13:11:56.812011 18939 sgd_solver.cpp:105] Iteration 60, lr = 0.0449973 I0309 13:12:07.103955 18939 solver.cpp:218] Iteration 80 (1.94333 iter/s, 10.2916s/20 iters), loss = 6.91425 I0309 13:12:07.104001 18939 solver.cpp:237] Train net output #0: loss = 6.91425 (* 1 = 6.91425 loss) I0309 13:12:07.104009 18939 sgd_solver.cpp:105] Iteration 80, lr = 0.0449964 I0309 13:12:17.393060 18939 solver.cpp:218] Iteration 100 (1.94388 iter/s, 10.2887s/20 iters), loss = 6.91382 I0309 13:12:17.393095 18939 solver.cpp:237] Train net output #0: loss = 6.91382 (* 1 = 6.91382 loss) I0309 13:12:17.393105 18939 sgd_solver.cpp:105] Iteration 100, lr = 0.0449955 I0309 13:12:27.754611 18939 solver.cpp:218] Iteration 120 (1.93029 iter/s, 10.3611s/20 iters), loss = 87.3365 I0309 13:12:27.754704 18939 solver.cpp:237] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss) I0309 13:12:27.754716 18939 sgd_solver.cpp:105] Iteration 120, lr = 0.0449946 I0309 13:12:38.106243 18939 solver.cpp:218] Iteration 140 (1.93215 iter/s, 10.3512s/20 iters), loss = 87.3365 I0309 13:12:38.106288 18939 solver.cpp:237] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss) I0309 13:12:38.106298 18939 sgd_solver.cpp:105] Iteration 140, lr = 0.0449937 I0309 13:12:48.467030 18939 solver.cpp:218] Iteration 160 (1.93043 iter/s, 10.3604s/20 iters), loss = 87.3365 I0309 13:12:48.467075 18939 solver.cpp:237] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss) I0309 13:12:48.467085 18939 sgd_solver.cpp:105] Iteration 160, lr = 0.0449928 I0309 13:12:58.827340 18939 solver.cpp:218] Iteration 180 (1.93052 iter/s, 10.3599s/20 iters), loss = 87.3365 I0309 13:12:58.827447 18939 solver.cpp:237] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss) I0309 13:12:58.827459 18939 sgd_solver.cpp:105] Iteration 180, lr = 0.0449919 I0309 13:13:09.212028 18939 solver.cpp:218] Iteration 200 (1.926 iter/s, 10.3842s/20 iters), loss = 87.3365 I0309 13:13:09.212060 18939 solver.cpp:237] Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss) I0309 13:13:09.212086 18939 sgd_solver.cpp:105] Iteration 200, lr = 0.044991
Remove the use_global_stats:true line in each of the BatchNorm layers.
Changing the group convolution to depthwise convolution does reduce the performance?
@SmartMachineBay I got the same loss value with you when i fine-tune the network on my own data set. Do you have some idea?