attention-is-all-you-need-pytorch
attention-is-all-you-need-pytorch copied to clipboard
n_position in positional encoding
How did you choose n_position as 200. What is this number based on?
Hoping to hear back. Thank you!
I think n_position is used to generate sin(x) and cos(x) value list, then slice [:SL], so only need n_position >= SL (seq_len)