Yujia Zhai

Results 2 repositories owned by Yujia Zhai

Optimizing-SGEMM-on-NVIDIA-Turing-GPUs

237
Stars
41
Forks
Watchers

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.