MobileNet-Caffe
MobileNet-Caffe copied to clipboard
The regularization of depthwise convolution
The author wrote following words in paper: Additionally, we found that it was important to put very little or no weight decay (l2 regularization) on the depthwise filters since their are so few parameters in them.
Therefore, i think that we should set decay_mult: 0.0 in the moblienet prototxt
Isn't this line taken from the MobilenetV1 paper? I couldn't find any such statement in the MobilenetV2 paper.
I wonder if all parameters are to be decayed in MobileNetV2 training - at-least that's the understanding that I get by looking at the repository's (very few) that provide a training script: eg: https://github.com/Randl/MobileNetV2-pytorch