triton Support performance warning

We would like to emit performance warnings in the following cases: fail to pipeline a loop; could not use wgmma; register spillage; could not vectorize sub-word load/store; some ptxas wgmma message; load/store is not coalesced; etc.

Goals from discussions: The warnings will be annotated to Triton .py source and IR code, in the form of Remarks or Warnings. MLIR supports both, the difference is just the diagnostic level. We also want to emit performance warnings from the llvm backend.

For each reamark/warning, we want to show the affected location in the kernel; source of the decision in the compiler code; reason for the decision.
Adds additional warnings obtained when compiling from LLVM to assembly (e.g., register spillage, ptx performance warnings, etc.).
If the kernel is compiled with -Wno-perf-warnings, do not display anything. Otherwise, error (if -Werror) or display them as warnings.

This commit adds a performance warning for not selecting MMA v3 for tl.dot on Hopper. For the added test case, we will get:

test-warning.py:24:18: remark: Warning: can't use MMA V3 for the dot op
    c = tl.dot(a, b)
                 ^
test-warning.py:24:18: note: see current operation: %39 = tt.dot %37, %38, %cst, inputPrecision = tf32 : tensor<32x128xf16, #triton_gpu.dot_op<{opIdx = 0, parent = #triton_gpu.blocked<{sizePerThread = [2, 2], threadsPerWarp = [2, 16], warpsPerCTA = [4, 1], order = [1, 0]}>}>> * tensor<128x32xf16, #triton_gpu.dot_op<{opIdx = 1, parent = #triton_gpu.blocked<{sizePerThread = [2, 2], threadsPerWarp = [2, 16], warpsPerCTA = [4, 1], order = [1, 0]}>}>> -> tensor<32x32xf32, #triton_gpu.blocked<{sizePerThread = [2, 2], threadsPerWarp = [2, 16], warpsPerCTA = [4, 1], order = [1, 0]}>>

There are a few things that need to be discussed:

We can add the setup for Diagnostic handlers in compiler.py in make_ttgir: diag = ir.source_mgr_diag(srcMgr, mod.context) or we can add it inside enable_debug: auto diag = SourceMgrDiagnosticHandler(srcMgr, context); It is not clear to me how pybind handles lifetime of these variables, the latter solution has multiple construction/destruction of SourceMgrDiagnosticHandler. So I chose the first option.
Where should we add the test case to detect the diagnostics? I can add a mlir test case with -verify-diagnostics and expected-remark. Right now, I am adding a test case under python/tutorial temporarily we will need a way to verify diagnostics are emitted at the right source line.
Do we want Remarks or Warnings? For llvm backend, remarks may work better.
Support warning flags or use env variables? It is not clear to me how to support warning flags when building the .py source code.

May 15 '24 16:05 manman-ren

Where is this flag coming from "-Wno-perf-warnings"?

May 15 '24 17:05 Jokeren

Adds additional warnings obtained when compiling from LLVM to assembly (e.g., register spillage, ptx performance warnings, etc.).

Can you use remark to emit ptx warnings? Maybe not?

May 15 '24 17:05 Jokeren

python/tutorials/test-warning.py

We shouldn't have tests as tutorials :) Could you create a unit test that exercises the new codepaths and checks for the output on std::cerr?

May 16 '24 14:05 ptillet

Thanks for the comments! As described in the summary:

Where should we add the test case to detect the diagnostics? I can add a mlir test case with -verify-diagnostics and expected-remark. Right now, I am adding a test case under python/tutorial temporarily we will need a way to verify diagnostics are emitted at the right source line. @ptillet Yeah I added to tutorial temporarily for discussion, I will move to the unit directory, which will be tested via pytest and I will try to figure out how to check against stdout there. If you have a test case that does similar thing, that will be great!
Do we want Remarks or Warnings? For llvm backend, remarks may work better.
Support warning flags or use env variables? It is not clear to me how to support warning flags when building the .py source code. @Jokeren If we want to go with warning flags, I need to figure out how to support -Wno-perf-warnings when building a py code. Again if anyone has any pointer, that will be great.

Can you use remark to emit ptx warnings? Maybe not?

You mean warnings from ptxas, not sure how to get them. But we can emit remarks when lowering from llvm to ptx.

CC @joker-eph: if you have any suggestion to some of the questions, that will be great, Thanks!

May 16 '24 15:05 manman-ren

You mean warnings from ptxas, not sure how to get them. But we can emit remarks when lowering from llvm to ptx.

My worry was that some warnings might be emitted from ptx to sass. I'm OK with either marks or warnings. I think marks actually seem more informative.

May 16 '24 16:05 Jokeren

You mean warnings from ptxas, not sure how to get them.

You could capture the output of ptxas and regex/pattern match the output to catch these and report them as remarks?

May 16 '24 21:05 joker-eph

Where should we add the test case to detect the diagnostics

test/unit/warnings.py sounds like a good place

I will try to figure out how to check against stdout there

Just to be clear, this should check against stderr

May 17 '24 02:05 ptillet

@ptillet About using warning flags -Wno-perf-warnings or -Werror, I feel it is not supported in the python workflow "python test.py", right? So is the suggestion about supplying the flags for triton-opt and still use env variables for the python workflow? CC @joker-eph in case Mehdi has some comments.

May 17 '24 15:05 manman-ren

The tests are failing. I need to only enable the test for H100 with the env variable.

Jun 13 '24 23:06 manman-ren