snowfall Initialization of last layer to zero

Initialization of last layer to zero

Open danpovey opened this issue 4 years ago • 4 comments

Guys, I just remembered a trick that we used to use in Kaldi to help models converge early on, and I tried it on a setup that was not converging great and it has a huge effect. I want to remind you of this (I don't have time to try it on one of our standard setups just now). It's just to set the last layer's parameters to zero.

   def __init__(self): 
   <snip>
        self.final_conv1d = nn.Conv1d(dim, num_classes, stride=1, kernel_size=1, bias=True)
        self.reset_parameters()

    def reset_parameters(self):
        torch.nn.init.constant_(self.final_conv1d.weight, 0.)
        torch.nn.init.constant_(self.final_conv1d.bias, 0.)

Apr 15 '21 08:04 danpovey

Mm, on the master branch with transformer, this gives an OOM error. We need to have some code LFMmiLoss to conditionally prune the lattices more if they are too large. @csukuangfj can you point me to any code that does this?

Apr 15 '21 08:04 danpovey

@danpovey Please see https://github.com/k2-fsa/snowfall/blob/ed4c74a210e005d8ed9e767a96b70b79271ab002/snowfall/decoding/lm_rescore.py#L262-L281

It is from #147

Apr 15 '21 08:04 csukuangfj

That's a cool trick. Why does it work?

Apr 15 '21 18:04 pzelasko

M actually in snowfall, now that I test properly, it's not clear that it's working. It's OK to leave one layer uninitialized, derivs will still be nonzero.

Apr 16 '21 03:04 danpovey

snowfall snowfall copied to clipboard

Initialization of last layer to zero

snowfall
snowfall copied to clipboard