keras-language-modeling The incorporation of attention in attention

The incorporation of attention in attention_lstm.py

Open ns-moosavi opened this issue 8 years ago • 0 comments

In the blog post and in the related literature about attention LSTM, attention is incorporated like

attention_state = tanh(dot(attention_vec, W_attn) + dot(new_hidden_state, U_attn))

However, in attention_lstm.py it is incorporated like:

attention_state = tanh(dot(attention_vec, W_attn) * dot(new_hidden_state, U_attn))

Is it a typo or do you find it a better way of incorporating attention?

Sep 24 '16 08:09 ns-moosavi