cublas topic
bandicoot-code
Bandicoot: C++ library for GPU linear algebra & scientific computing - https://coot.sourceforge.io
awesome-cuda-and-hpc
🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, TensorRT and High Performance Computing (HPC) projects.
cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
cuda_hook
Hooked CUDA-related dynamic libraries by using automated code generation tools.
DSAbeamformer
Real-time GPU Beamformer for DSA110 written in C/CUDA
Tiled-MM
Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.
caffe-escoin
Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs
mkl-verbose-toolkit
Tools to run and parse MKL verbose mode
cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
cuda-beginner-course-cpp-version
bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码