flash-attention topic

List flash-attention repositories

flash_attention_inference

20
Stars
2
Forks
Watchers

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

decoding_attention

17
Stars
1
Forks
Watchers

Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.

flashattention2-custom-mask

65
Stars
5
Forks
Watchers

Triton implementation of FlashAttention2 that adds Custom Masks.

Inf-CLIP

270
Stars
12
Forks
270
Watchers

[CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training schem...