transformer-xl
transformer-xl copied to clipboard
Sin/Cos concatenation in Positional Embeddings
This is how the positional embeddings matrix is constructed in the code:
sinusoid_inp = torch.ger(pos_len,self.inv_freq)
pos_emb = torch.cat([sinusoid_inp.sin(), sinusoid_inp.cos()],dim=-1)
This basically creates a matrix of [sin | cos] whereas the implementation in other papers including the original Attention is all you need had a positional embedding in which the sin and cos alternated between each other along the embedding dimension. Does this have anything to do with the relative positional embedding?
Thanks!
https://github.com/kimiyoung/transformer-xl/issues/8#issuecomment-455187360
For position embedding, the two columns are equivalent, simply because they are consumed by the matrix multiplication which is permutation-invariant.