Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
eth-cscs
Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.