cutlass [QST]Using customized operators in gemm kernel

Hi I am wondering if there is a way to replace some operators in a GEMM operator in cutlass? For example, if I want to replace multiplication with addition and replace the accumulation with shifting, is there a way to only modify these operations while keeping the rest gemm unchanged?

May 03 '25 14:05 CoffeeCat3008871

I'm not one of the cutlass/cute devs but if you take something like the sgemm_sm80.cu you can look at the sizes of the modes of the tCrA, tCrB, tCrC fragments by printing or guessing and then replacing gemm(...) you can instead do:

CUTE_UNROLL
for (int mma = 0; mma < size<0>(tCrC); mma++) {
    CUTE_UNROLL
    for (int m = 0; m < size<1>(tCrC); m++) {
        CUTE_UNROLL
        for (int n = 0; n < size<2>(tCrC); n++) {
            tCrC(mma,m,n) += tCrA(mma,m,k_block) * tCrB(mma,n,k_block); 
        }
    }
}

tCrA is type TA, tCrB is type TB etc.. you can do anything in those loops that you would if you wrote a regular cuda kernel.

May 03 '25 14:05 capybara-club

Thanks, I will take a look. It is shown that in the sgemm_sm80.cu the gemm can be replaced by a block of for-loop with CUTE_UNROLL pragma, and I am wondering if this is also applicable to cutlass or is there a similar way to perform the equivalent functionality by this for-loop nest in cutlass?

May 03 '25 15:05 CoffeeCat3008871

Ohhh, i'm sorry. I guess i projected my use-cases on to yours. I had a similar problem to yours and I got it going by recruiting that cute example. I'm doing a subtraction instead of a multiply in gemm and I figured I was stuck with a simt kernel that couldn't be assisted by tensor cores anyway. I'm not sure about cutlass proper. You might be able to find examples in the cutlass folder or hopefully someone else here can help you.

May 03 '25 15:05 capybara-club

No problem, thank you a lot for the inspired references.

May 03 '25 15:05 CoffeeCat3008871

you could check this

https://github.com/hpcgarage/cuASR

cc @thakkarV

May 06 '25 02:05 hwu36

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Jun 05 '25 02:06 github-actions[bot]

You can write your own MMA/FMA Ops and Traits to create custom atoms.

See the MMAs made for complex inputs, for example.

Jun 05 '25 02:06 ccecka

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Jul 05 '25 04:07 github-actions[bot]

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

Oct 03 '25 05:10 github-actions[bot]