flashinfer topic

List flashinfer repositories

decoding_attention

46
Stars
4
Forks
46
Watchers

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.

whl

17
Stars
4
Forks
17
Watchers

Kernel Library Wheel for SGLang