Transformer
Transformer copied to clipboard
ys = trg[:, 1:].contiguous().view(-1),why do we have to discard the first seq?
Hello~ may I ask a question? In this line of code -- ys = trg[:, 1:].contiguous().view(-1),why do we have to discard the first seq?