cuda-kernel topic

List cuda-kernel repositories

torchsort

745
Stars
33
Forks
Watchers

Fast, differentiable sorting and ranking in PyTorch

how-to-optimize-gemm

547
Stars
77
Forks
Watchers

row-major matmul optimization

kernl

1.5k
Stars
90
Forks
Watchers

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.