Vijay Thakkar

Results 81 comments of Vijay Thakkar

I'm on vacation for a couple weeks but can help asynchronously on this thread. The concepts presented in example 52 are applicable pretty much 1:1 over here. I'm specifically referring...

> that GEMMs produced via the CUTLASS 3 API for CC < 90 are not currently as well optimized as those produced via the CUTLASS 2 API That said, do...

Hi! We just released CUTLASS 3.5 and it contains an [example of CUTLASS 3.x based gather/scatter convolution kernel](https://github.com/NVIDIA/cutlass/tree/main/examples/59_ampere_gather_scatter_conv)

I don't understand your question. The operators are accessing the underling values within the tensor by accepting an input coordinate and returning either a Tensor or a single value for...

tagging @IonThruster and @ANIKET-SHIVAM as well

@zhang662817 have you been able to use the NT layout TF32 kernel from the CUTLASS profiler? You can copy its configuration since we know that one does not spill and...

> What's the difference bewteen float and tf32? In culass, float uses tf32 tcore and tf32 alse uses 32 bit in storage in shared smem and register file, is right?...

have you tried out the kernel from profiler corresponding to this layout ? I forget the optimized configuration that does not spill for TF32, but it is present in our...

I see a `-g` flag in your nvcc command line. Does the issue occur if you remove the -g. Additionally, does this issue persist if you change the `-O1` to...