decoding-attention topic
List
decoding-attention repositories
decoding_attention
17
Stars
1
Forks
Watchers
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.