custom_matmul_kernels icon indicating copy to clipboard operation
custom_matmul_kernels copied to clipboard

Customized matrix multiplication kernels

Custom Matmul Kernels

This repository contains source code for this blog post.

Dependency

  • Python 3.7.10 or higher
  • CuPy 7.4.0 or higher
  • Pytorch 1.8.1 or higher
  • Only tested with CUDA 11.2