mxnet-ssd
mxnet-ssd copied to clipboard
Train network with no bias in convolution layer
Hi @zhreshold I already train my mobilenetSSD network in Caffe with no bias. However, the convergence speed of network is too slow (mAP~35% after 3 days) I just try Mxnet and i found that the performance in training of Mxnet is significantly better than Caffe. But i dont know how to remove 'beta' term in batch norm layer in MxNet like i did in Caffe. For another way, I remove batchnorm layer but the network couldn't converge. Can you give me some hints?
you can set lr_mult of batchnorm beta term to 0 to fix the beta, which is initialized as 0.
Thanks @zhreshold , it worked!
@zhreshold is this also fix 'gamma' term? Should i remove 'fix_gamma=True' in bn layer?
@titikid You can leave gamma unfixed or not, depending your result, but I would prefer leave it free.
I already trained 2 models from scratch, all parameters is set as default (lr=0.004, batch=48, single gpu)
- model with fixed beta (only in base mobilenet network) and gamma: ~41.5% mAP after 220 epoches. Train log here
- model with fixed beta (only in base mobilenet network): ~42% mAP after 220 epoches. Train log here @zhreshold Can you take a look and give me some tips for better mAP? should i train with bigger dataset first and fine-tune in voc2007/2012?
you have to use ImageNet pre-trained weights, otherwise you need a DSSD variant.
Hi @zhreshold I found that if i remove "beta" term only, the convolution still has a small shift factor because the impact of "running_mean" term. i set "lr_mult" of "running_mean" term to 0 but i still see it updated during training. So, how can i completely remove it?
@zhreshold can you give me some suggestion?
@titikid For maximum flexibility I suggest you to use broadcast multiply instead of batchnorm itself. You have full control of how the behavior is without hacking batchnorm itself.
@zhreshold i'm not really clear what do you mean for now, but i will investigate it. Thanks!