triton [AMD] MFMA shortcut test

This PR:

moves shortcut check above allocation code, before any scratch buffer shape is computed
raise priority of AMD specific over common conversions

Apr 29 '24 18:04 binarman

LGTM. But given @zhanglx13 is already on this I'll defer to him to approve.

May 23 '24 00:05 antiagainst

The changes in this PR makes sense to me. However, the lit test can pass with current tip of the main branch. So can you double check if the fix in this PR is still needed? Maybe the original failed test can pass. And you need a new lit test.

May 25 '24 02:05 zhanglx13

@zhanglx13

The changes in this PR makes sense to me. However, the lit test can pass with current tip of the main branch. So can you double check if the fix in this PR is still needed? Maybe the original failed test can pass. And you need a new lit test.

Yes, current ToT works with this test.

Change https://github.com/triton-lang/triton/pull/3791/files#diff-c05cf3aed297bf0c5f1296cc40c522b00fb300c7a4340a1f6be5b0bbe2c42039R1557 fixed original problem, but did not introduce any tests and did not fix "high level" issue: getShapePerCTATileForDotOperands should not be called in the first place.

Main purpose of this PR is add simple regression test and align AMD implementation with NVidia(i.e. do not compute allocation sizes for tensors that are covered by shortcut optimization).

This PR also separates priorities of general and AMD specific conversion passes to guarantee that AMD specific conversions will be applied first.

I can not reproduce wrongly applied conversions issue consistently, looks like it depends on in memory data structures layout, which is different on every build/workload.

May 27 '24 16:05 binarman

triton triton copied to clipboard

[AMD] MFMA shortcut test

triton
triton copied to clipboard