Attention_Network_With_Keras
Attention_Network_With_Keras copied to clipboard
h, _, c = at_LSTM(context, initial_state=[h, c])
Why not take the output of the previous time step as the input of the next time step, together with context as the input?
While it is technically correct, it is more idiomatic to separate the previous output and true input: context
Does this answer your question? c: