triton
triton copied to clipboard
[AMD][WMMA] Support dot3d
This PR enables support of 3d dot and fixes tests in test_core.py
+cc @joviliast
LGTM Thank you for this PR.
Have you run test_dot locally on navi ?
Have you run test_dot locally on navi ?
yes
- fp16->fp32 tests pass
- fp16->fp16 some tests pass, some fail due to mismatches (few values exceed tolerance threshold)
- fp32->fp32 are not supported in WMMA, they go through FMA pipeline and fail
- int8 tests do not pass, but this is expected, test_dot do not work at the moment as well
@antiagainst hi! From my experience, most of dot issues are related to wrong indexing/address computations, which requires large and complex lit test to check. Such test will be extremely fragile, very complex and hard to read.
@joviliast is working on same code, so even if I add lit test, it will probably break in near future adding more redundant work to him(or me, it depends who will merge changes first).
I can implement some basic test, which will check that there are no crashes, but in my opinion this test does not guarantee much.
P.s. We have some basic llir interpreter which can help checking changes from this PR, but at this point it requires some massive work. I prefer to invest time in this task, if correctness on Navi aligns with our team priorities.
@antiagainst PTAL
Regarding tests, I treat it as what we want to invest to guard against future breakages. We don't have RDNA CI; so this can easily regress. Compared to the efforts spent on writing some tests now (which is mostly one time), I'm more concerned about the potential time lost on debugging all these complex logic in the future only via integration python tests in a sense. And we don't know how many regressions we will see throughout the journey.
Also btw lit tests don't need to be super detailed and cover all the lines; we can just cover important parts so it's not a change detector. I don't think it's a lot of effort to update them, given that the index caculation doesn't change frequently I believe. And whatever we change there it's delibrate--it can help for folks touching the code to verify their changes too. (Keep in mind that there are contributors that only do MFMA parts--they will not run their changes on some RDNA cards to verify things pass. let alone folks only touching nvidia support. But a lit tests runs everywhere and can provide us guarantees.)
This PR has extensive indexing calculation. So + @zhanglx13 to double check too.