Vijay Thakkar comments

Results 81 comments of


                                            Vijay Thakkar

[QST] Gather/Scatter in cute/cutlass 3

I'm on vacation for a couple weeks but can help asynchronously on this thread. The concepts presented in example 52 are applicable pretty much 1:1 over here. I'm specifically referring...

[QST] Gather/Scatter in cute/cutlass 3

> that GEMMs produced via the CUTLASS 3 API for CC < 90 are not currently as well optimized as those produced via the CUTLASS 2 API That said, do...

[QST] Gather/Scatter in cute/cutlass 3

Hi! We just released CUTLASS 3.5 and it contains an [example of CUTLASS 3.x based gather/scatter convolution kernel](https://github.com/NVIDIA/cutlass/tree/main/examples/59_ampere_gather_scatter_conv)

[QST]What is operator? How we use operator? (To access tensor elements)

I don't understand your question. The operators are accessing the underling values within the tensor by accepting an input coordinate and returning either a Tensor or a single value for...

[BUG] make_tiled_copy should not assume 2d data, Thr and Val layouts

@ccecka

[QST] how to avoid register spill for example 48

tagging @IonThruster and @ANIKET-SHIVAM as well

[QST] how to avoid register spill for example 48

@zhang662817 have you been able to use the NT layout TF32 kernel from the CUTLASS profiler? You can copy its configuration since we know that one does not spill and...

[QST] how to avoid register spill for example 48

> What's the difference bewteen float and tf32? In culass, float uses tf32 tcore and tf32 alse uses 32 bit in storage in shared smem and register file, is right?...

[QST] how to avoid register spill for example 48

have you tried out the kernel from profiler corresponding to this layout ? I forget the optimized configuration that does not spill for TF32, but it is present in our...

[BUG] Illegal CUDA shared memory access in SM90 GEMM TMA Warpspecialized at ClusterBarrier::init

I see a `-g` flag in your nvcc command line. Does the issue occur if you remove the -g. Additionally, does this issue persist if you change the `-O1` to...