mxnet-ssd icon indicating copy to clipboard operation
mxnet-ssd copied to clipboard

Train network with no bias in convolution layer

Open ndcuong91 opened this issue 6 years ago • 10 comments

Hi @zhreshold I already train my mobilenetSSD network in Caffe with no bias. However, the convergence speed of network is too slow (mAP~35% after 3 days) I just try Mxnet and i found that the performance in training of Mxnet is significantly better than Caffe. But i dont know how to remove 'beta' term in batch norm layer in MxNet like i did in Caffe. For another way, I remove batchnorm layer but the network couldn't converge. Can you give me some hints?

ndcuong91 avatar Nov 08 '18 07:11 ndcuong91

you can set lr_mult of batchnorm beta term to 0 to fix the beta, which is initialized as 0.

zhreshold avatar Nov 08 '18 19:11 zhreshold

Thanks @zhreshold , it worked!

ndcuong91 avatar Nov 09 '18 04:11 ndcuong91

@zhreshold is this also fix 'gamma' term? Should i remove 'fix_gamma=True' in bn layer?

ndcuong91 avatar Nov 09 '18 06:11 ndcuong91

@titikid You can leave gamma unfixed or not, depending your result, but I would prefer leave it free.

zhreshold avatar Nov 09 '18 19:11 zhreshold

I already trained 2 models from scratch, all parameters is set as default (lr=0.004, batch=48, single gpu)

  • model with fixed beta (only in base mobilenet network) and gamma: ~41.5% mAP after 220 epoches. Train log here
  • model with fixed beta (only in base mobilenet network): ~42% mAP after 220 epoches. Train log here @zhreshold Can you take a look and give me some tips for better mAP? should i train with bigger dataset first and fine-tune in voc2007/2012?

ndcuong91 avatar Nov 10 '18 10:11 ndcuong91

you have to use ImageNet pre-trained weights, otherwise you need a DSSD variant.

zhreshold avatar Nov 11 '18 19:11 zhreshold

Hi @zhreshold I found that if i remove "beta" term only, the convolution still has a small shift factor because the impact of "running_mean" term. i set "lr_mult" of "running_mean" term to 0 but i still see it updated during training. So, how can i completely remove it?

ndcuong91 avatar Dec 03 '18 10:12 ndcuong91

@zhreshold can you give me some suggestion?

ndcuong91 avatar Dec 12 '18 04:12 ndcuong91

@titikid For maximum flexibility I suggest you to use broadcast multiply instead of batchnorm itself. You have full control of how the behavior is without hacking batchnorm itself.

zhreshold avatar Dec 12 '18 21:12 zhreshold

@zhreshold i'm not really clear what do you mean for now, but i will investigate it. Thanks!

ndcuong91 avatar Dec 13 '18 10:12 ndcuong91