cuda-kernel topic
List
cuda-kernel repositories
torchsort
745
Stars
33
Forks
Watchers
Fast, differentiable sorting and ranking in PyTorch
how-to-optimize-gemm
583
Stars
78
Forks
Watchers
row-major matmul optimization
kernl
1.5k
Stars
90
Forks
Watchers
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.