gemm topic

List gemm repositories

cublasHgemm-P100

34
Stars
13
Forks
Watchers

Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm

openai-gemm.pytorch

20
Stars
4
Forks
Watchers

PyTorch bindings for openai-gemm

Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.

mmul

34
Stars
6
Forks
Watchers

Serial and parallel implementations of matrix multiplication

awesome-cuda-and-hpc

134
Stars
16
Forks
Watchers

🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, TensorRT and High Performance Computing (HPC) projects.

cuda_hgemm

270
Stars
62
Forks
Watchers

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

hipBLASLt

37
Stars
52
Forks
Watchers

hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library

spla

26
Stars
5
Forks
Watchers

Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceleration.

ozIMMU

44
Stars
2
Forks
Watchers

FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme

cuda_hgemv

48
Stars
4
Forks
Watchers

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.