Vijay Thakkar

Results 81 comments of Vijay Thakkar

https://github.com/NVIDIA/cutlass/tree/main/examples/55_hopper_mixed_dtype_gemm

Please see https://github.com/NVIDIA/cutlass/issues/1484

Yes there's a fix coming for that specific issue imminently.

Additionally, for really small values of M, you are likely do be b/w bound anyway, for which you can likely get roofline perf from recompiling CUTLASS 2.x Ampere kernels (with...

Yep, a dedicated epilogue for pointer array and grouped Gemm is coming before we tag 3.5

Hi! This is for CUTLASS version 3.1 which was released quite a few months ago (before the release of CUDA 12.4). Are you able to repro this with CUTLASS 3.4?...

That's an out of memory error so likely an issue with the compiler or the system used to build the kernels? CC @mhoemmen

Honestly, no good technical reason. So my answer won't be satisfying to you. There's some subtleties in their usage and semantics but generally they serve similar purposes.