alexarmbr
alexarmbr
# ❓ Questions and Help I am a CUDA programmer and want some practice in writing efficient transformer related ops for NVIDIA GPUs. Is there anywhere I could find a...
**Describe the bug** I tried to compile and run this code which I copied directly from `03_tensor.md` ``` auto tv_layout = Layout
For educational purposes I am working on writing an fp16 GEMM kernel that is as performant as cuBLAS HGEMM. I am using CuTe tensors/layouts handle the index calculations and shared...