flash-attention topic
List
flash-attention repositories
flash_attention_inference
20
Stars
2
Forks
Watchers
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
decoding_attention
17
Stars
1
Forks
Watchers
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.
flashattention2-custom-mask
65
Stars
5
Forks
Watchers
Triton implementation of FlashAttention2 that adds Custom Masks.
Inf-CLIP
270
Stars
12
Forks
270
Watchers
[CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training schem...