slimming
slimming copied to clipboard
Bias masking in BN layers
Hi @liuzhuang13
I'm not sure whether you are able to mask out bias in BN layers too. (v.bias:cmul(mask)) Since what you minimized and pruned are actually weight not bias. For BN layers, y=γx+β. You pruned small γ ones, but how about β ? It may be large or important. For me, after I masked out β I got an enormous accuracy drop.
If there is any misunderstanding of the works just please tell me. Thank you.
In my experiment, masking out bias did not seem to change accuracy much. I thought this was because if γ is zero, then the output of that channel is the same for all input (all β), so that channel is not important and the network learned to let β be small. Even β is large, that channel outputs the same activations for all input, so I think it is not that important. If there is accuracy drop in your experiment, I think fine-tuning can recover that.