decoding-attention topic
List
decoding-attention repositories
decoding_attention
46
Stars
4
Forks
46
Watchers
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.