flash-mla topic

List flash-mla repositories

Awesome-LLM-Inference

4.9k
Stars
330
Forks
4.9k
Watchers

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

ffpa-attn

242
Stars
12
Forks
242
Watchers

🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.