gpt-fast icon indicating copy to clipboard operation
gpt-fast copied to clipboard

Rotary Embeddings Implementation

Open hello-fri-end opened this issue 6 months ago • 0 comments

I was comparing the rotary embedding implementation in this repository with the implementations in the official Llama and Deepseek repositories using this Jupyter notebook: link. In Llama and Deepseek repositories, complex multiplication is used to perform the rotation of the q and k values, whereas it is implemented more explicitly here. Mathematically, I understand these methods are equivalent since:

$$(x + yi) \cdot (\cos t + i \sin t) = (x \cdot \cos t - y \cdot \sin t) + i \cdot (x \cdot \sin t + y \cdot \cos t)$$

  • LHS: Used in Llama and Deepseek implementations
  • RHS: Used in the GPT-Fast implementation

As demonstrated in the notebook, the complex multiplication approach is significantly faster. Maybe I'm missing something but is there a difference because of which the explicit method is preferred here?

hello-fri-end avatar Apr 09 '25 08:04 hello-fri-end