keras-language-modeling
keras-language-modeling copied to clipboard
The incorporation of attention in attention_lstm.py
In the blog post and in the related literature about attention LSTM, attention is incorporated like
attention_state = tanh(dot(attention_vec, W_attn) + dot(new_hidden_state, U_attn))
However, in attention_lstm.py it is incorporated like:
attention_state = tanh(dot(attention_vec, W_attn) * dot(new_hidden_state, U_attn))
Is it a typo or do you find it a better way of incorporating attention?