transformer-xl icon indicating copy to clipboard operation
transformer-xl copied to clipboard

Sin/Cos concatenation in Positional Embeddings

Open zainsarwar865 opened this issue 5 years ago • 1 comments

This is how the positional embeddings matrix is constructed in the code:

sinusoid_inp = torch.ger(pos_len,self.inv_freq)
pos_emb = torch.cat([sinusoid_inp.sin(), sinusoid_inp.cos()],dim=-1)

This basically creates a matrix of [sin | cos] whereas the implementation in other papers including the original Attention is all you need had a positional embedding in which the sin and cos alternated between each other along the embedding dimension. Does this have anything to do with the relative positional embedding?

Thanks!

zainsarwar865 avatar Aug 26 '20 18:08 zainsarwar865

https://github.com/kimiyoung/transformer-xl/issues/8#issuecomment-455187360

For position embedding, the two columns are equivalent, simply because they are consumed by the matrix multiplication which is permutation-invariant.

serkansulun avatar Jan 19 '21 11:01 serkansulun