Bruce-Lee-LY

Results 5 repositories owned by


                                            Bruce-Lee-LY

cuda_hgemm

270

Stars

Forks

Watchers

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Bruce-Lee-LY

cublas

cuda

gemm

gpu

cuda_hook

129

Stars

Forks

Watchers

Hooked CUDA-related dynamic libraries by using automated code generation tools.

Bruce-Lee-LY

auto-generate

code-generate

cublas

cublaslt

cuda_hgemv

Stars

Forks

Watchers

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

Bruce-Lee-LY

cublas

cuda

cuda-core

gemm

flash_attention_inference

Stars

Forks

Watchers

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

Bruce-Lee-LY

cuda

cutlass

flash-attention

flash-attention-2

decoding_attention

Stars

Forks

Watchers

Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.

Bruce-Lee-LY

cuda

cuda-core

decoding-attention

flash-attention