Possible to use rotary embedding without flash attention?

Open Ph0rk0z opened this issue 10 months ago • 3 comments

Flash attention is ampere+ and while mamba compiles on turning, rotary embedding doesn't work. So google T4, 2080ti and all those cards are locked out once again. Most of these SSM models are small and can probably afford the memory hit over compatibility.

Feb 11 '25 13:02 Ph0rk0z

Rotary implementation is 1-2 files, written in pytorch and triton. You can copy those files.

Feb 11 '25 14:02 tridao

Rotary implementation is 1-2 files, written in pytorch and triton. You can copy those files.

Would you mind elaborating on how to do what you're talking about?

Feb 24 '25 05:02 YellowRoseCx

This file and the one it imports https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/layers/rotary.py

Feb 24 '25 06:02 tridao