multiplicative-lstm-tensorflow
multiplicative-lstm-tensorflow copied to clipboard
sigmoid of output
In lines 159 and 161, shouldn't tf.sigmoid be applied to the output as well?
Hi,
Thank you for your comment.
In the original paper, eq.(16) says h = tanh(c * o), and I think the above is implemented in the source code. The paper says "This is slightly different from the typical LSTM variant...", and in the typical LSTM, output h can be calculated as h = sigmoid (c) * o I guess you mean the 2nd above is better, however, I can't conclude for now which formulation is better because its performance depends on the task it will be applied. Adding a flag variable to switch the implementations is a possible choice, I think.
Best regards, Akira
Sorry, I mean tf.sigmoid(o). I couldn't see where the sigmoid function is applied to "o" (Eq. 14 or Eq. 21 from the paper).
After i, j, f, o = tf.split(lstm_matrix, 4, 1) you applied tf.sigmoid individually in the components but I couldn't see it being applied to the "o" component (should be somewhere between lines 135 and 161, I guess).
Great implementation, by the way.
Hi,
I understand what you mean. I'll check it.