decoding_attention
decoding_attention copied to clipboard

Published 20 hours ago •

→

Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.

Results 0 decoding_attention issues

Sort by recently updated

Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.

gpu

cuda

inference

nvidia

multi-head-attention

mha

llm

large-language-model

flash-attention

cuda-core

decoding-attention

flashinfer

Stars

Forks

Watchers

Stars

Forks

Watchers

Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.