triton
triton copied to clipboard
[AMD] MFMA shortcut test
This PR:
- moves shortcut check above allocation code, before any scratch buffer shape is computed
- raise priority of AMD specific over common conversions
LGTM. But given @zhanglx13 is already on this I'll defer to him to approve.
The changes in this PR makes sense to me. However, the lit test can pass with current tip of the main branch. So can you double check if the fix in this PR is still needed? Maybe the original failed test can pass. And you need a new lit test.
@zhanglx13
The changes in this PR makes sense to me. However, the lit test can pass with current tip of the main branch. So can you double check if the fix in this PR is still needed? Maybe the original failed test can pass. And you need a new lit test.
Yes, current ToT works with this test.
Change https://github.com/triton-lang/triton/pull/3791/files#diff-c05cf3aed297bf0c5f1296cc40c522b00fb300c7a4340a1f6be5b0bbe2c42039R1557 fixed original problem, but did not introduce any tests and did not fix "high level" issue: getShapePerCTATileForDotOperands
should not be called in the first place.
Main purpose of this PR is add simple regression test and align AMD implementation with NVidia(i.e. do not compute allocation sizes for tensors that are covered by shortcut optimization).
This PR also separates priorities of general and AMD specific conversion passes to guarantee that AMD specific conversions will be applied first.
I can not reproduce wrongly applied conversions issue consistently, looks like it depends on in memory data structures layout, which is different on every build/workload.