Y Song issues

Repositories
Issues
Comments

Results 2 issues of


                                            Y Song

[Linear Attention] Update fused_recurrent.py for inference with nomalization=true

the current linear attention can save a $KV$ state cache. This works when normalization is not enabled. When normalization is enabled. the output should be $\frac{QKV}{QK1}$. we can see that...

Clarification for the paper's needle-in-a-haystack section

In the needle-in-a-haystack section of your paper, you mentioned: "However, linearizing with passkey samples (LoLCATs Llama 3 8B (Passkey)) recovers 100% accuracy." Does this step involving lora-finetuning with passkey samples?...