decoding-attention topic

List decoding-attention repositories

decoding_attention

17
Stars
1
Forks
Watchers

Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.