tutorials
tutorials copied to clipboard
Transformer tutorial multiplying with sqrt(d_model)
https://github.com/pytorch/tutorials/blob/5e772fa2bf406598103e61e628a0ca0b8e471bfa/beginner_source/translation_transformer.py#L135
src = self.embedding(src) * math.sqrt(self.d_model)
shouln't this be
src = self.embedding(src) / math.sqrt(self.d_model)
at least that is the impression I got when reading the "Attention is all you need" paper. Or is there some new research finding that multiplying is better?