LSTM hidden layer computation
Hi, I'm just wondering why you use this form in https://github.com/coreylynch/grid-lstm/blob/master/model/GridLSTM.lua#L31
local next_h = nn.CMulTable()({out_gate, nn.Tanh()(next_c)})
in the paper it is
local next_h = nn.Tanh()(nn.CMulTable()({out_gate, next_c}))
Thank you for your response
I also have a question about weight sharing: for the time LSTM in your example, weights are not shared between layers (they are shared in time only, thanks to clones) while for the depth LSTM, weights are shared between layers and time. This makes a lot a sense, in fact.
But it surprised me at first read because the "tied N-LSTM" is, by definition, sharing weight along all dimensions.
Either
- NOT cloning weights of the depth LSTM in times, or
- share also the weights of the time LSTM in depth would be more coherent... do you have any idea also ?
Thanks,
I think the paper said the same weight for the time and depth of LSTM. You can refer to the paper 4.3.