[QST] How to implement a fused mixed precision matrix multiplication such as w4a4 + w16a16?
Dear Team,
I wish to implement a fused mixed precision matrix multiplication such as w4a4 + w16a16 where the w16a16 part is small. An example of this kernel used is for accelerating an LLM with LoRA applied.
I can find some examples in "torchao" that implement matrix multiplication of w4a4/w4a8 and integrate matrix multiplication and dequantization via epilogue, but I don't know how to further integrate matrix multiplication of w16a16 on top of it, is there any examples I can refer to?
Could you please elaborate the input and output of every step? Do you want to fuse two gemms into one kernel similar as what our ex.13 does?
Thank you very much for your reply! The input consists of two activations X1[L, D1], X2[L, D2] and two weight matrices W1[D, D1], W2[D, D2], where $L = 2048, D_1 = 4096, D_2 = 64, D = 4096$. The output is $Y = X_1 W_1^\top + X_2 W_2^\top$. Meanwhile, X1 and W1 will be quantized to 4bit.
I think the difference with ex.13 is that I need to add the results of the two GEMMs instead of computing the latter based on the results of the former GEMM.
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
BD
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.