a-PyTorch-Tutorial-to-Transformers why attend over the <end> token?

why attend over the <end> token?

Open homelifes opened this issue 5 years ago • 0 comments

trafficstars

Hi @sgrvinod in the xe train function:

predicted_sequences = model(source_sequences, target_sequences, source_sequence_lengths, target_sequence_lengths) # (N, max_target_sequence_pad_length_this_batch, vocab_size)

The target_sequence_lengths still includes the lengths with the <end> token, and in this case in MultiHead Attention it will be attending over the <end> token.

I think it should be: target_sequence_lengths - 1 predicted_sequences = model(source_sequences, target_sequences, source_sequence_lengths, target_sequence_lengths - 1) # (N, max_target_sequence_pad_length_this_batch, vocab_size)

Please clarify

Jun 28 '20 06:06 homelifes

a-PyTorch-Tutorial-to-Transformers a-PyTorch-Tutorial-to-Transformers copied to clipboard

why attend over the <end> token?

a-PyTorch-Tutorial-to-Transformers
a-PyTorch-Tutorial-to-Transformers copied to clipboard