Possible to use rotary embedding without flash attention?
Flash attention is ampere+ and while mamba compiles on turning, rotary embedding doesn't work. So google T4, 2080ti and all those cards are locked out once again. Most of these SSM models are small and can probably afford the memory hit over compatibility.
Rotary implementation is 1-2 files, written in pytorch and triton. You can copy those files.
Rotary implementation is 1-2 files, written in pytorch and triton. You can copy those files.
Would you mind elaborating on how to do what you're talking about?
This file and the one it imports https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/layers/rotary.py