attention-is-all-you-need-pytorch
attention-is-all-you-need-pytorch copied to clipboard
what is meaning of trg_pad_idx in label smoothing loss?