practical-pytorch
practical-pytorch copied to clipboard
For the attention model why we need last context?
In this https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation.ipynb,
why do we need to feed the last_context as an rnn input?
# Combine embedded input word and last context, run through RNN
rnn_input = torch.cat((word_embedded, last_context.unsqueeze(0)), 2)
rnn_output, hidden = self.gru(rnn_input, last_hidden)
Actually, context is used in the output layer:
output = F.log_softmax(self.out(torch.cat((rnn_output, context), 1)))
In your explanation:
RNN layer(s) $f$ with inputs $(s_{i-1}, y_{i-1}, c_i)$ and internal hidden state, outputting $s_i$ rnn_input = concat(embedded, context) rnn_output, rnn_hidden = rnn(rnn_input, last_hidden) an output layer $g$ with inputs $(y_{i-1}, s_i, c_i)$, outputting $y_i$
But in the RNN, you are using c_{i-1}? Also in the original paper, they did not use c for RNN, right?
Thanks!
The paper mentions an output layer $g$ with those arguments after the RNN state $s_i$. I found that adding the (just calculated) context $c_i$ as another input to that output layer was helpful, but adding the previous output $y_{i-1}$ was not (in terms of training complexity/time vs. results).
Hi @spro, I am having the same question, could you please point out which sentence mentioned the $g$? Thanks! The paper is at https://arxiv.org/pdf/1508.04025.pdf