gemm topic

List gemm repositories

cublasHgemm-P100

34

Stars

13

Forks

Watchers

Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm

openai-gemm.pytorch

20

Stars

4

Forks

Watchers

PyTorch bindings for openai-gemm

Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F

102

Stars

19

Forks

Watchers

Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.

mmul

34

Stars

6

Forks

Watchers

Serial and parallel implementations of matrix multiplication

CoffeeBeforeArch

matrix-multiplication

awesome-cuda-and-hpc

134

Stars

16

Forks

Watchers

🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, TensorRT and High Performance Computing (HPC) projects.

cuda_hgemm

270

Stars

62

Forks

Watchers

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

hipBLASLt

37

Stars

52

Forks

Watchers

hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library

spla

26

Stars

5

Forks

Watchers

Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceleration.

ozIMMU

44

Stars

2

Forks

Watchers

FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme

mixed-precision

cuda_hgemv

48

Stars

4

Forks

Watchers

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.