decoding-attention topic

List decoding-attention repositories

Stars

Forks

Watchers

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.