triton icon indicating copy to clipboard operation
triton copied to clipboard

[AMD][WMMA] Support dot3d

Open binarman opened this issue 10 months ago • 7 comments

This PR enables support of 3d dot and fixes tests in test_core.py

binarman avatar Apr 16 '24 15:04 binarman

+cc @joviliast

binarman avatar Apr 16 '24 15:04 binarman

LGTM Thank you for this PR.

Have you run test_dot locally on navi ?

joviliast avatar Apr 16 '24 17:04 joviliast

Have you run test_dot locally on navi ?

yes

  • fp16->fp32 tests pass
  • fp16->fp16 some tests pass, some fail due to mismatches (few values exceed tolerance threshold)
  • fp32->fp32 are not supported in WMMA, they go through FMA pipeline and fail
  • int8 tests do not pass, but this is expected, test_dot do not work at the moment as well

binarman avatar Apr 16 '24 18:04 binarman

@antiagainst hi! From my experience, most of dot issues are related to wrong indexing/address computations, which requires large and complex lit test to check. Such test will be extremely fragile, very complex and hard to read.

@joviliast is working on same code, so even if I add lit test, it will probably break in near future adding more redundant work to him(or me, it depends who will merge changes first).

I can implement some basic test, which will check that there are no crashes, but in my opinion this test does not guarantee much.

P.s. We have some basic llir interpreter which can help checking changes from this PR, but at this point it requires some massive work. I prefer to invest time in this task, if correctness on Navi aligns with our team priorities.

binarman avatar May 01 '24 14:05 binarman

@antiagainst PTAL

binarman avatar May 13 '24 14:05 binarman

Regarding tests, I treat it as what we want to invest to guard against future breakages. We don't have RDNA CI; so this can easily regress. Compared to the efforts spent on writing some tests now (which is mostly one time), I'm more concerned about the potential time lost on debugging all these complex logic in the future only via integration python tests in a sense. And we don't know how many regressions we will see throughout the journey.

Also btw lit tests don't need to be super detailed and cover all the lines; we can just cover important parts so it's not a change detector. I don't think it's a lot of effort to update them, given that the index caculation doesn't change frequently I believe. And whatever we change there it's delibrate--it can help for folks touching the code to verify their changes too. (Keep in mind that there are contributors that only do MFMA parts--they will not run their changes on some RDNA cards to verify things pass. let alone folks only touching nvidia support. But a lit tests runs everywhere and can provide us guarantees.)

antiagainst avatar May 23 '24 03:05 antiagainst

This PR has extensive indexing calculation. So + @zhanglx13 to double check too.

antiagainst avatar May 23 '24 03:05 antiagainst