Vijay Thakkar
Vijay Thakkar
https://github.com/NVIDIA/cutlass/tree/main/examples/55_hopper_mixed_dtype_gemm
Please see https://github.com/NVIDIA/cutlass/issues/1484
Yes there's a fix coming for that specific issue imminently.
Additionally, for really small values of M, you are likely do be b/w bound anyway, for which you can likely get roofline perf from recompiling CUTLASS 2.x Ampere kernels (with...
Yep, a dedicated epilogue for pointer array and grouped Gemm is coming before we tag 3.5
Hi! This is for CUTLASS version 3.1 which was released quite a few months ago (before the release of CUDA 12.4). Are you able to repro this with CUTLASS 3.4?...
That's an out of memory error so likely an issue with the compiler or the system used to build the kernels? CC @mhoemmen
CC @hwu36 and @mhoemmen
Good catch :)
Honestly, no good technical reason. So my answer won't be satisfying to you. There's some subtleties in their usage and semantics but generally they serve similar purposes.