[QST]Using customized operators in gemm kernel
Hi I am wondering if there is a way to replace some operators in a GEMM operator in cutlass? For example, if I want to replace multiplication with addition and replace the accumulation with shifting, is there a way to only modify these operations while keeping the rest gemm unchanged?
I'm not one of the cutlass/cute devs but if you take something like the sgemm_sm80.cu you can look at the sizes of the modes of the tCrA, tCrB, tCrC fragments by printing or guessing and then replacing gemm(...) you can instead do:
CUTE_UNROLL
for (int mma = 0; mma < size<0>(tCrC); mma++) {
CUTE_UNROLL
for (int m = 0; m < size<1>(tCrC); m++) {
CUTE_UNROLL
for (int n = 0; n < size<2>(tCrC); n++) {
tCrC(mma,m,n) += tCrA(mma,m,k_block) * tCrB(mma,n,k_block);
}
}
}
tCrA is type TA, tCrB is type TB etc.. you can do anything in those loops that you would if you wrote a regular cuda kernel.
Thanks, I will take a look. It is shown that in the sgemm_sm80.cu the gemm can be replaced by a block of for-loop with CUTE_UNROLL pragma, and I am wondering if this is also applicable to cutlass or is there a similar way to perform the equivalent functionality by this for-loop nest in cutlass?
Ohhh, i'm sorry. I guess i projected my use-cases on to yours. I had a similar problem to yours and I got it going by recruiting that cute example. I'm doing a subtraction instead of a multiply in gemm and I figured I was stuck with a simt kernel that couldn't be assisted by tensor cores anyway. I'm not sure about cutlass proper. You might be able to find examples in the cutlass folder or hopefully someone else here can help you.
No problem, thank you a lot for the inspired references.
you could check this
https://github.com/hpcgarage/cuASR
cc @thakkarV
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
You can write your own MMA/FMA Ops and Traits to create custom atoms.
See the MMAs made for complex inputs, for example.
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.