tensor-cores topic

List tensor-cores repositories

tGeMM

26

Stars

7

Forks

26

Watchers

General Matrix Multiplication using NVIDIA Tensor Cores

cuda-programming

gpu-programming

ffpa-attn

242

Stars

12

Forks

242

Watchers

🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.