SASRec.pytorch Is there any explanation for adjusting seqs after embedding?

Is there any explanation for adjusting seqs after embedding?

Open baiyuting opened this issue 3 years ago • 4 comments

I found seq *= self.item_emb.embedding_dim ** 0.5 in function log2feats(self, log_seqs), Is there any reason for adjusting seqs after embedding?

`seqs = self.item_emb(torch.LongTensor(log_seqs).to(self.dev))

seqs *= self.item_emb.embedding_dim ** 0.5`

Feb 03 '22 12:02 baiyuting

@baiyuting it's kind of normalization operation inherited from the original BERT paper https://arxiv.org/abs/1810.04805, check Prof. Lee https://speech.ee.ntu.edu.tw/~hylee/index.php Transformer+BERT lectures on Youtube/Bilibili if interested. BTW, u are encouraged to remove this if want to try, SASRec is not that deep comparing to BERT.

Feb 07 '22 01:02 pmixer

I clone bert from https://github.com/google-research/bert.git, but find no related code in modeling.py:embedding_lookup() , do I missed something? Could you give me a more specific elaboration, since it is a trick I did not notice before?

Feb 07 '22 08:02 baiyuting

I clone bert from https://github.com/google-research/bert.git, but find no related code in modeling.py:embedding_lookup() , do I missed something? Could you give me a more specific elaboration, since it is a trick I did not notice before?

ops, my fault, BERT is just the encoder of Transformer, should refer to https://arxiv.org/pdf/1706.03762.pdf section 3.2.1 on Scaled Dot-Product Attention:

Attention(Q, K, V ) = softmax(QK/√d) V

for me, its just a normalization operation, if you are very interested in it, pls try to play with it based on math like in https://medium.com/@shoray.goel/kaiming-he-initialization-a8d9ed0b5899 on your own.

Feb 08 '22 02:02 pmixer

SASRec.pytorch SASRec.pytorch copied to clipboard

Is there any explanation for adjusting seqs after embedding?

SASRec.pytorch
SASRec.pytorch copied to clipboard