custom_matmul_kernels
custom_matmul_kernels copied to clipboard
Customized matrix multiplication kernels
Custom Matmul Kernels
This repository contains source code for this blog post.
Dependency
- Python 3.7.10 or higher
- CuPy 7.4.0 or higher
- Pytorch 1.8.1 or higher
- Only tested with CUDA 11.2