flashinfer topic

List flashinfer repositories

decoding_attention

46

Stars

4

Forks

46

Watchers

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

decoding-attention

flash-attention

whl

17

Stars

4

Forks

17

Watchers

Kernel Library Wheel for SGLang