grid-lstm LSTM hidden layer computation

Hi, I'm just wondering why you use this form in https://github.com/coreylynch/grid-lstm/blob/master/model/GridLSTM.lua#L31

local next_h = nn.CMulTable()({out_gate, nn.Tanh()(next_c)})

in the paper it is

local next_h = nn.Tanh()(nn.CMulTable()({out_gate, next_c}))

Thank you for your response

Sep 21 '16 21:09 christopher5106

I also have a question about weight sharing: for the time LSTM in your example, weights are not shared between layers (they are shared in time only, thanks to clones) while for the depth LSTM, weights are shared between layers and time. This makes a lot a sense, in fact.

But it surprised me at first read because the "tied N-LSTM" is, by definition, sharing weight along all dimensions.

Either

NOT cloning weights of the depth LSTM in times, or
share also the weights of the time LSTM in depth would be more coherent... do you have any idea also ?

Thanks,

Sep 23 '16 14:09 christopher5106

I think the paper said the same weight for the time and depth of LSTM. You can refer to the paper 4.3.

Dec 16 '16 06:12 ytoon