g2-lstm icon indicating copy to clipboard operation
g2-lstm copied to clipboard

What is B?

Open felixhao28 opened this issue 7 years ago • 3 comments

I am trying to follow your code but here is where I get lost:

            self.B = input_.data.new(input_.size()).bernoulli_(self.p)
            self.noise = self.U * self.B

What is the purpose of B? To simulate some kind of dropout for the noise? Is it mentioned in the paper somewhere?

Thanks in advance.

source: https://github.com/zhuohan123/g2-lstm/blob/master/language-modeling/g2_lstm.py#L42

felixhao28 avatar Aug 08 '18 13:08 felixhao28

I think his code is totally different from the paper.

wenhuchen avatar Aug 15 '18 00:08 wenhuchen

It is dropout applied to the Gumbel noise. Please check the README for the detail.

zhuohan123 avatar Aug 15 '18 02:08 zhuohan123

Thanks. Somehow I missed that part in readme.

In our experiment, we arbitrarily set p=0.5 but the loss stopped decreasing after a few epochs. Then we completely removed self.B and then the training can continue as normal. In the end, the outputs of the LSTM gates are more skewed towards a Bernoulli distribution (0 and 1) than it did previously, but the end to end accuracy was a just little lower comparing to using plain LSTM. So my conclusion is that G2-LSTM is not a universal drop-in improvement for every task. The idea is very profound though.

Mathematically, does it even make sense to apply such dropout to the Gumbel noise? Randomly subtracting a portion from some of the population will just create two distribution.

And just out of curiosity, have you tried applying the same trick to GRU gates?

felixhao28 avatar Aug 15 '18 03:08 felixhao28