SASRec.pytorch
SASRec.pytorch copied to clipboard
Is there any explanation for adjusting seqs after embedding?
I found seq *= self.item_emb.embedding_dim ** 0.5 in function log2feats(self, log_seqs), Is there any reason for adjusting seqs after embedding?
`seqs = self.item_emb(torch.LongTensor(log_seqs).to(self.dev))
seqs *= self.item_emb.embedding_dim ** 0.5`
@baiyuting it's kind of normalization operation inherited from the original BERT paper https://arxiv.org/abs/1810.04805, check Prof. Lee https://speech.ee.ntu.edu.tw/~hylee/index.php Transformer+BERT lectures on Youtube/Bilibili if interested. BTW, u are encouraged to remove this if want to try, SASRec is not that deep comparing to BERT.
I clone bert from https://github.com/google-research/bert.git, but find no related code in modeling.py:embedding_lookup() , do I missed something? Could you give me a more specific elaboration, since it is a trick I did not notice before?
I clone bert from https://github.com/google-research/bert.git, but find no related code in modeling.py:embedding_lookup() , do I missed something? Could you give me a more specific elaboration, since it is a trick I did not notice before?
ops, my fault, BERT is just the encoder of Transformer, should refer to https://arxiv.org/pdf/1706.03762.pdf section 3.2.1 on Scaled Dot-Product Attention:
Attention(Q, K, V ) = softmax(QK/√d) V
for me, its just a normalization operation, if you are very interested in it, pls try to play with it based on math like in https://medium.com/@shoray.goel/kaiming-he-initialization-a8d9ed0b5899 on your own.