Positional-Encoding Some confuse about position encoder

Some confuse about position encoder

Open GuoShi28 opened this issue 5 years ago • 1 comments

PE(pos, 2i) = sin(pos / 10000 ^ (2i/dim)), Why the parameter is set as 10000? Does this have some meaning for this task.
I do not fully understand the position encoder. Why the sin and cos functions are alternant utilized?

Looking forward to your reply. Thank you.

Apr 19 '19 08:04 GuoShi28

As far as I know, the Positional Encoding is a method that is used for the transformer model. Basically, the sin(pos / 10000 ^ (2i/dim)) is a sinusoidal function, which is a function that is like a sine function in the sense that the function can be produced by shifting, stretching or compressing the sine function.

According to the "Attention is all you need" paper, they chose the sinusoid function because they hypothesized it would allow the model to easily learn to attend by relative positions, since for any fixed offset k, P E_(pos+k) can be represented as a linear function of P E_pos.

You could find this in the section 3.5 of the "Attention is all you need" paper. https://arxiv.org/pdf/1706.03762.pdf

Jun 27 '20 07:06 YeonwooSung

Positional-Encoding Positional-Encoding copied to clipboard

Some confuse about position encoder

Positional-Encoding
Positional-Encoding copied to clipboard