cuda-core topic

List cuda-core repositories

cuda_hgemv

48
Stars
4
Forks
Watchers

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

decoding_attention

46
Stars
4
Forks
46
Watchers

Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.