gemma_pytorch [Question] Embeddings normalization by sqrt(hidden

[Question] Embeddings normalization by sqrt(hidden_size)

Open Andrei-Aksionov opened this issue 11 months ago • 0 comments

Hello there 👋

Thanks for the repo. But I have one question: why do we need to scale up (normalize) token embeddings? https://github.com/google/gemma_pytorch/blob/01062c9ef4cf89ac0c985b25a734164ede017d0b/gemma/model.py#L431-L432

Unfortunately, I cannot find an answer anywhere.

Feb 27 '24 11:02 Andrei-Aksionov

gemma_pytorch gemma_pytorch copied to clipboard

[Question] Embeddings normalization by sqrt(hidden_size)

gemma_pytorch
gemma_pytorch copied to clipboard