MobileNetv2-SSDLite The loss value didn't drop when it came down to about 7.X?

The loss value didn't drop when it came down to about 7.X?

Open wangzhe0623 opened this issue 5 years ago • 14 comments

Hi, @chuanqi305 Thanks for your great job, I got some problems here. I used scrips to convert tf model to caffe model and got "deploy.caffemodel". After that, I used that weights to fine tune on my 2-class dataset. Network configurations are all as you provided in "ssdlite/voc", I changed layer names and output channels of "conf" layers. While training, learning rate is 0.0001 at the beginning, the loss start to drop util it was about 7. So weights didn't converge at all. I wondered what's wrong with it? Eager for your reply~~~~

Jul 15 '18 03:07 wangzhe0623

i have the same problem...

Jul 19 '18 10:07 yokings

same question，have you solved it？

Jul 27 '18 01:07 zyc4me

Same question.......

I0802 16:57:27.196552 21083 solver.cpp:259]     Train net output #0: mbox_loss = 6.57651 (* 1 = 6.57651 loss)
I0802 16:57:27.196557 21083 sgd_solver.cpp:138] Iteration 7310, lr = 0.05
I0802 16:57:28.975070 21083 solver.cpp:243] Iteration 7320, loss = 7.2598
I0802 16:57:28.975109 21083 solver.cpp:259]     Train net output #0: mbox_loss = 7.0054 (* 1 = 7.0054 loss)
I0802 16:57:28.975116 21083 sgd_solver.cpp:138] Iteration 7320, lr = 0.05
I0802 16:57:30.831727 21083 solver.cpp:243] Iteration 7330, loss = 7.34756
I0802 16:57:30.831763 21083 solver.cpp:259]     Train net output #0: mbox_loss = 6.6218 (* 1 = 6.6218 loss)
I0802 16:57:30.831768 21083 sgd_solver.cpp:138] Iteration 7330, lr = 0.05
I0802 16:57:32.776068 21083 solver.cpp:243] Iteration 7340, loss = 7.76406
I0802 16:57:32.776100 21083 solver.cpp:259]     Train net output #0: mbox_loss = 7.26238 (* 1 = 7.26238 loss)
I0802 16:57:32.776106 21083 sgd_solver.cpp:138] Iteration 7340, lr = 0.05
I0802 16:57:34.534003 21083 solver.cpp:243] Iteration 7350, loss = 7.65554
I0802 16:57:34.534036 21083 solver.cpp:259]     Train net output #0: mbox_loss = 7.10536 (* 1 = 7.10536 loss)
I0802 16:57:34.534042 21083 sgd_solver.cpp:138] Iteration 7350, lr = 0.05
I0802 16:57:36.399013 21083 solver.cpp:243] Iteration 7360, loss = 6.90834
I0802 16:57:36.399049 21083 solver.cpp:259]     Train net output #0: mbox_loss = 6.39814 (* 1 = 6.39814 loss)
I0802 16:57:36.399055 21083 sgd_solver.cpp:138] Iteration 7360, lr = 0.05
I0802 16:57:38.430330 21083 solver.cpp:243] Iteration 7370, loss = 7.53202
I0802 16:57:38.430369 21083 solver.cpp:259]     Train net output #0: mbox_loss = 7.61347 (* 1 = 7.61347 loss)

Aug 02 '18 08:08 jimchen2018

just continue to train

Aug 09 '18 05:08 zyc4me

@allenwangcheng @jimchen2018 @jimchen2018 I solved it. My dataset is quite difficult``````````

Aug 14 '18 06:08 wangzhe0623

@wangzhe0623 Hello, I met the same problem with you. Could you tell me the meaning of difficult dataset? Is your dataset too complex to train?

Aug 15 '18 04:08 yulong112

@yulong112 YES！

Aug 15 '18 06:08 wangzhe0623

@wangzhe0623 Thanks！ But what are your solutions? Just continue to train? Or change your dataset?

Aug 15 '18 08:08 yulong112

@wangzhe0623 Also want to know your solutions,thanks! how much is your train dataset size and trained Iteration?

Sep 06 '18 02:09 zhanghanbin3159

I have the same question, and I search some solving method , I found the batch_norm_param use_global_stats must be false, after I add the param , my trainning loss is decreased, you can try it, may be help you!!! And the BN param like the following: layer { name: "conv_1/expand/bn" type: "BatchNorm" bottom: "conv_1/expand" top: "conv_1/expand" batch_norm_param {

 use_global_stats: false
 eps: 1e-5
 #eps: 0.001

} param { lr_mult: 0 decay_mult: 0 } param { lr_mult: 0 decay_mult: 0 } param { lr_mult: 0 decay_mult: 0 } }

Apr 28 '19 08:04 Hanlos

@Hanlos the value of use_global_stats is false when in phrase TRAIN, so I think it's no correctional with use_global_stats, did you modify some other values?

May 18 '19 01:05 passion3394

@wangzhe0623 Hello,I used widerface to train the SSD_Lite model, Could you please tell me how to use deploy.caffemodel to fine tune on 2-class dataset? I mean that you convert the cocomodel to a 2-class model directly? Or convert the cocomodel to vocmodel, then you just use the part weights to fituning your model? Please help me, Thxs.

May 22 '19 03:05 wsycl

@Hanlos the value of use_global_stats is false when in phrase TRAIN, so I think it's no correctional with use_global_stats, did you modify some other values?

i have the same problem...have you been solved it? Please, help me. Thanks!

May 14 '20 06:05 1343464520

Hi, @chuanqi305 Thanks for your great job, I got some problems here. I used scrips to convert tf model to caffe model and got "deploy.caffemodel". After that, I used that weights to fine tune on my 2-class dataset. Network configurations are all as you provided in "ssdlite/voc", I changed layer names and output channels of "conf" layers. While training, learning rate is 0.0001 at the beginning, the loss start to drop util it was about 7. So weights didn't converge at all. I wondered what's wrong with it? Eager for your reply~~~~

Increase the learning rate and adopt annealing method，using this method and the loss decreased from 4 to 2 in two hours

Jun 29 '20 11:06 weilanShi

MobileNetv2-SSDLite MobileNetv2-SSDLite copied to clipboard

The loss value didn't drop when it came down to about 7.X?

MobileNetv2-SSDLite
MobileNetv2-SSDLite copied to clipboard