mamba icon indicating copy to clipboard operation
mamba copied to clipboard

Possible to use rotary embedding without flash attention?

Open Ph0rk0z opened this issue 10 months ago • 3 comments

Flash attention is ampere+ and while mamba compiles on turning, rotary embedding doesn't work. So google T4, 2080ti and all those cards are locked out once again. Most of these SSM models are small and can probably afford the memory hit over compatibility.

Ph0rk0z avatar Feb 11 '25 13:02 Ph0rk0z

Rotary implementation is 1-2 files, written in pytorch and triton. You can copy those files.

tridao avatar Feb 11 '25 14:02 tridao

Rotary implementation is 1-2 files, written in pytorch and triton. You can copy those files.

Would you mind elaborating on how to do what you're talking about?

YellowRoseCx avatar Feb 24 '25 05:02 YellowRoseCx

This file and the one it imports https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/layers/rotary.py

tridao avatar Feb 24 '25 06:02 tridao