cutlass
cutlass copied to clipboard
[BUG] Misaligned address when running GEMM with SM90 EVT-based epilogue
When running a standalone Cutlass GEMM with a generated SM90 EVT-based epilogue which loads two auxiliary inputs ( one broadcasted, one with full dimensionality), I get a CUDA error about a misaligned access. Since all participating tensors have at least an alignment of 512 bytes, this could be a Cutlass bug. On manual inspection, I could not see a problem in the user side of the code.
Using cuda-gdb to break at the point of error, and using the command "x/10i $errorpc" shows that the CUDA instruction pointer is on an "UTMALDG.2D" SASS instruction where the error happens.
One of the inputs (called X) loaded as auxiliary input is actually also operand A for the GEMM.
Code to reproduce, environment info and build / run instructions are here: https://gist.github.com/kadeng/1e44299d22ce5a11da55ad0e5f328d3f
The code is generated as part of the experimental Cutlass backend for Pytorch's inductor JIT compiler.
If this example is changed such that the loaded auxiliary operand is of the same shape but not the same (pointer) as operand A, the error does not happen. So it's likely an address conflict. Is there anything that can be done to allow this? It's a pretty common thing to have these kind of residual connections, e.g. having something like "activation(a @ b) + a"
@thakkarV @richardmcai
Just as additional info: Adding -DNDEBUG and -O3 and removing -g and -lineinfo from the build flags does not make a difference here in my tests.
@kadeng can you clarify what the shapes of these operands are supposed to be? EVT currently only supports loading of MNL-shape tensors, or broadcasting scalars/vectors to MNL-shape tensors. It's not clear to me what the result of activation(a @ b) + a
is supposed to be since the activation(a @ b)
is shape MNL and a
is shape MKL, unless we assume that N == K here.
If this example is changed such that the loaded auxiliary operand is of the same shape but not the same (pointer) as operand A, the error does not happen.
do you mean not of the same shape as A?
If remember correctly, A and B are both square matrices of Same shape here. You can find the details in the gist, where you will find a standalone source code example to reproduce it, including all shapes.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
@kadeng did you resolve your issue?
No, this is still a bug as far as I can tell. It's not urgent, though, since we're not using auxiliary inputs anymore.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
@apuaaChen, your first assignment :)
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.