Vijay Thakkar comments

Results 81 comments of


                                            Vijay Thakkar

[QST]question about cutlass epilogue customization

btw, I do not want to discourage you from using the 3.x API on Ampere, its totally kosher, we just recommend 2.x API for best performing Ampere kernels since they...

[QST]question about cutlass epilogue customization

the alternative solution is to just launch two different kernels on two separate streams, which will likely give you equivalent or perhaps even better perf depending on the problem shapes...

[QST] Is it possible to detect output coordinates in elementwise epilogue ?

For CUTLASS 3.x epilogues based on CuTe, its trivial to inject the coordinate from the collective epi into the thread functor. We already create the coordinate tensor for the purposes...

[QST] Is it possible to detect output coordinates in elementwise epilogue ?

@hwu36 can help answer for 2.x API epilogues.

[QST] Is there any INT8 GEMM with INT8 alpha and beta?

because shaving off 4 bytes to 1 byte for a single load per tile does not change the perf at all. Changing fp32 multiplication to int8 will also not move...

[QST] Is there any INT8 GEMM with INT8 alpha and beta?

Although I doubt it, you can certainly try int8 alpha/beta to see if it would help in this case. What you would have to do is modify the epilogue thread...

[QST] Is s8 * s8 = {s32, s8} supported in cuTe?

Without more info than what you've given, all I can say is "yes". The int8 atoms exist for all archs

[BUG] w4a8 mixed-input gemm for fine-grained quantization

@rawnhenry are we missing a static assert somewhere in the collective for valid tile shapes?

[QST] What is the difference between make_shape and make_tile?

A tile is a tuple of layouts. If you divide with a shape, that is equivalent to dividing with a tile of trivial layouts (layouts who have the same shape,...

[QST] Gather/Scatter in cute/cutlass 3

Sounds like you want a grouped gemm that supports gather/scatter? Have you taken a look at example 52 for inspiration? Happy to help with the design, but using CuTe is...