BERT-pytorch
BERT-pytorch copied to clipboard
PositionalEmbedding
The position embedding in the BERT is not the same as in the transformer. Why not use the form in bert?
@Yang92to Great Point, I'll check out the BERT positional embedding method, and update ASAP
@codertimo the BERT positional embedding method is to just learn an embedding for each position. So you can use nn.Embedding with a constant input sequence [0,1,2,...,L-1] where L is the maximum sequence length.
@codertimo Since BERT uses learned positional embeddings and it is one of the biggest difference between original transformers and BERT, I think it is quite urgent to modify the positional embedding part.