flash-attention topic

List flash-attention repositories

flash_attention_inference

20

Stars

2

Forks

Watchers

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

flash-attention

flash-attention-2

decoding_attention

17

Stars

1

Forks

Watchers

Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.

decoding-attention

flash-attention

flashattention2-custom-mask

65

Stars

5

Forks

Watchers

Triton implementation of FlashAttention2 that adds Custom Masks.

attention-mechanism

Inf-CLIP

270

Stars

12

Forks

270

Watchers

[CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training schem...

contrastive-learning

flash-attention

infinite-batch-size