Vijay Thakkar comments

Results 81 comments of


                                            Vijay Thakkar

Are there any examples of mixed input Gemm (A type fp16, B type fp8)?

https://github.com/NVIDIA/cutlass/tree/main/examples/55_hopper_mixed_dtype_gemm

[BUG] Broken copy.hpp

Please see https://github.com/NVIDIA/cutlass/issues/1484

[BUG] Broken copy.hpp

Yes there's a fix coming for that specific issue imminently.

[FEA] FP8 grouped gemm kernel without TMA

Additionally, for really small values of M, you are likely do be b/w bound anyway, for which you can likely get roofline perf from recompiling CUTLASS 2.x Ampere kernels (with...

Allow setting a custom TmaDescriptor for TMAStore.

Yep, a dedicated epilogue for pointer array and grouped Gemm is coming before we tag 3.5

[BUG] Unable to build against CUDA 12.4 without

Hi! This is for CUTLASS version 3.1 which was released quite a few months ago (before the release of CUDA 12.4). Are you able to repro this with CUTLASS 3.4?...

[BUG] Unable to build against CUDA 12.4 without

That's an out of memory error so likely an issue with the compiler or the system used to build the kernels? CC @mhoemmen

[BUG] Unable to build against CUDA 12.4 without

CC @hwu36 and @mhoemmen

Fix SMEM index for C in CuTe examples

Good catch :)

[QST]Array in cute and in cutlass

Honestly, no good technical reason. So my answer won't be satisfying to you. There's some subtleties in their usage and semantics but generally they serve similar purposes.