practical-pytorch icon indicating copy to clipboard operation
practical-pytorch copied to clipboard

For the attention model why we need last context?

Open hunkim opened this issue 6 years ago • 2 comments

In this https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation.ipynb,

why do we need to feed the last_context as an rnn input?

       # Combine embedded input word and last context, run through RNN
        rnn_input = torch.cat((word_embedded, last_context.unsqueeze(0)), 2)
        rnn_output, hidden = self.gru(rnn_input, last_hidden)

Actually, context is used in the output layer:

        output = F.log_softmax(self.out(torch.cat((rnn_output, context), 1)))

In your explanation:

RNN layer(s) $f$ with inputs $(s_{i-1}, y_{i-1}, c_i)$ and internal hidden state, outputting $s_i$ rnn_input = concat(embedded, context) rnn_output, rnn_hidden = rnn(rnn_input, last_hidden) an output layer $g$ with inputs $(y_{i-1}, s_i, c_i)$, outputting $y_i$

But in the RNN, you are using c_{i-1}? Also in the original paper, they did not use c for RNN, right?

Thanks!

hunkim avatar Nov 08 '17 10:11 hunkim

The paper mentions an output layer $g$ with those arguments after the RNN state $s_i$. I found that adding the (just calculated) context $c_i$ as another input to that output layer was helpful, but adding the previous output $y_{i-1}$ was not (in terms of training complexity/time vs. results).

spro avatar Jan 02 '18 17:01 spro

Hi @spro, I am having the same question, could you please point out which sentence mentioned the $g$? Thanks! The paper is at https://arxiv.org/pdf/1508.04025.pdf

zyxue avatar May 28 '18 17:05 zyxue