slimming Bias masking in BN layers

Bias masking in BN layers

Open KengChiLiu opened this issue 7 years ago • 1 comments

Hi @liuzhuang13

I'm not sure whether you are able to mask out bias in BN layers too. (v.bias:cmul(mask)) Since what you minimized and pruned are actually weight not bias. For BN layers, y=γx+β. You pruned small γ ones, but how about β ? It may be large or important. For me, after I masked out β I got an enormous accuracy drop.

If there is any misunderstanding of the works just please tell me. Thank you.

Jun 13 '18 11:06 KengChiLiu

In my experiment, masking out bias did not seem to change accuracy much. I thought this was because if γ is zero, then the output of that channel is the same for all input (all β), so that channel is not important and the network learned to let β be small. Even β is large, that channel outputs the same activations for all input, so I think it is not that important. If there is accuracy drop in your experiment, I think fine-tuning can recover that.

Jun 13 '18 22:06 liuzhuang13

slimming slimming copied to clipboard

Bias masking in BN layers

slimming
slimming copied to clipboard