gpt-fast
                                
                                 gpt-fast copied to clipboard
                                
                                    gpt-fast copied to clipboard
                            
                            
                            
                        Rotary Embeddings Implementation
I was comparing the rotary embedding implementation in this repository with the implementations in the official Llama and Deepseek repositories using this Jupyter notebook: link. In Llama and Deepseek repositories, complex multiplication is used to perform the rotation of the q and k values, whereas it is implemented more explicitly here. Mathematically, I understand these methods are equivalent since:
$$(x + yi) \cdot (\cos t + i \sin t) = (x \cdot \cos t - y \cdot \sin t) + i \cdot (x \cdot \sin t + y \cdot \cos t)$$
- LHS: Used in Llama and Deepseek implementations
- RHS: Used in the GPT-Fast implementation
As demonstrated in the notebook, the complex multiplication approach is significantly faster. Maybe I'm missing something but is there a difference because of which the explicit method is preferred here?