gpt-fast Rotary Embeddings Implementation

Rotary Embeddings Implementation

Open hello-fri-end opened this issue 6 months ago • 0 comments

I was comparing the rotary embedding implementation in this repository with the implementations in the official Llama and Deepseek repositories using this Jupyter notebook: link. In Llama and Deepseek repositories, complex multiplication is used to perform the rotation of the q and k values, whereas it is implemented more explicitly here. Mathematically, I understand these methods are equivalent since:

$$(x + yi) \cdot (\cos t + i \sin t) = (x \cdot \cos t - y \cdot \sin t) + i \cdot (x \cdot \sin t + y \cdot \cos t)$$

LHS: Used in Llama and Deepseek implementations
RHS: Used in the GPT-Fast implementation

As demonstrated in the notebook, the complex multiplication approach is significantly faster. Maybe I'm missing something but is there a difference because of which the explicit method is preferred here?

Apr 09 '25 08:04 hello-fri-end

gpt-fast gpt-fast copied to clipboard

Rotary Embeddings Implementation

gpt-fast
gpt-fast copied to clipboard