copynet icon indicating copy to clipboard operation
copynet copied to clipboard

Issue with gen_scores and copy_scores computation

Open dheeraj7596 opened this issue 4 years ago • 0 comments

Code starts here


transformed_hidden2 = self.copy_W(output).view(batch_size, self.hidden_size, 1) copy_score_seq = torch.bmm(encoder_outputs, transformed_hidden2) # this is linear. add activation function before multiplying. copy_scores = torch.bmm(torch.transpose(copy_score_seq, 1, 2), one_hot_input_seq).squeeze(1) # [b, vocab_size + seq_length] missing_token_mask = (one_hot_input_seq.sum(dim=1) == 0) # tokens not present in the input sequence missing_token_mask[:, 0] = 1 # <MSK> tokens are not part of any sequence copy_scores = copy_scores.masked_fill(missing_token_mask, -1000000.0)

gen_scores = self.out(output.squeeze(1)) # [b, vocab_size] gen_scores[:, 0] = -1000000.0 # penalize <MSK> tokens in generate mode too`


I have some issues with your above computation of copy_scores and gen_scores. Please let me know if I am wrong anywhere.

1.) In the computation of copy_scores, it is mentioned in the paper to multiply encoder_outputs with a weight matrix and apply activation function and then, multiply with the decoder RNN's hidden state. But your code seems to be doing totally different i.e. multiplying weight matrix with output of decoder RNN and multiplying the result with encoder_outputs. There is no non-linearity here.

2.) In the gen_scores computation, your code multiplies the output to a weight matrix where as in the paper, it is mentioned to compute the way it's done in Attention RNN encoder-decoder but between the one-hot encoding of word and the decoder RNN's hidden state. This is totally different from your implementation.

Can you please let me know if I misunderstood anything?

Thanks in advance!

dheeraj7596 avatar Apr 05 '20 09:04 dheeraj7596