multiplicative-lstm-tensorflow icon indicating copy to clipboard operation
multiplicative-lstm-tensorflow copied to clipboard

sigmoid of output

Open lspinheiro opened this issue 7 years ago • 3 comments

In lines 159 and 161, shouldn't tf.sigmoid be applied to the output as well?

lspinheiro avatar Jun 19 '17 01:06 lspinheiro

Hi,

Thank you for your comment.

In the original paper, eq.(16) says h = tanh(c * o), and I think the above is implemented in the source code. The paper says "This is slightly different from the typical LSTM variant...", and in the typical LSTM, output h can be calculated as h = sigmoid (c) * o I guess you mean the 2nd above is better, however, I can't conclude for now which formulation is better because its performance depends on the task it will be applied. Adding a flag variable to switch the implementations is a possible choice, I think.

Best regards, Akira

tam17aki avatar Jun 19 '17 09:06 tam17aki

Sorry, I mean tf.sigmoid(o). I couldn't see where the sigmoid function is applied to "o" (Eq. 14 or Eq. 21 from the paper).

After i, j, f, o = tf.split(lstm_matrix, 4, 1) you applied tf.sigmoid individually in the components but I couldn't see it being applied to the "o" component (should be somewhere between lines 135 and 161, I guess).

Great implementation, by the way.

lspinheiro avatar Jun 19 '17 23:06 lspinheiro

Hi,

I understand what you mean. I'll check it.

tam17aki avatar Jun 19 '17 23:06 tam17aki