triton icon indicating copy to clipboard operation
triton copied to clipboard

[FRONTEND] Enable debug mode only when the options say so

Open superbobry opened this issue 1 year ago • 11 comments

Prior to this commit make_* functions in the CUDABackend enabled debug mode unconditionally. This led to slower-than-necessary compile times, even when the caller passed CUDAOptions with debug=False.

I mirrored this change in HIPBackend for consistency.

superbobry avatar Feb 14 '24 13:02 superbobry

When will options.debug be True?

Jokeren avatar Feb 14 '24 14:02 Jokeren

I am using Triton via jax_triton, so I am not aware of the idiomatic Triton way of doing that. In jax_triton users can construct cb.CUDAOptions(..., debug=True) and pass that to each of the cb.CUDABackend.make_* functions.

superbobry avatar Feb 14 '24 15:02 superbobry

I am using Triton via jax_triton, so I am not aware of the idiomatic Triton way of doing that. In jax_triton users can construct cb.CUDAOptions(..., debug=True) and pass that to each of the cb.CUDABackend.make_* functions.

I'll think about how to refactor this. The debugging information was left on purpose. Some other users may want the debugging information there by default.

Jokeren avatar Feb 14 '24 15:02 Jokeren

Thanks for reporting the increase in compilation time. That's definitely a concern

Jokeren avatar Feb 14 '24 15:02 Jokeren

I appreciate that some users might prefer to see debug information, but I would recommend making debugging opt-in and documenting the process of enabling it on RTD. It could be as easy as setting an env variable.

To add a bit more context, we observe very slow compilation times internally with debug mode on and llvm/llvm-project#78228. LLVMTranslationInterface emits a diagnostic for each FuncOp parameter (because of the tt.divisibility attribute), and diagnostics are expensive with printStackTraceOnDiagnostic enabled.

superbobry avatar Feb 14 '24 20:02 superbobry

I appreciate that some users might prefer to see debug information, but I would recommend making debugging opt-in and documenting the process of enabling it on RTD. It could be as easy as setting an env variable.

It will need more work to figure out the names of each "debug" mode. Right now TRITON_DEBUG is mainly used to enable/disable assert only. And there are many additional debug flags in triton for different purposes.

Jokeren avatar Feb 14 '24 22:02 Jokeren

How about TRITON_DEBUG_COMPILE or even just re-using TRITON_DEBUG to decide whether to enable debug mode in pass manager?

superbobry avatar Feb 14 '24 23:02 superbobry

How about TRITON_DEBUG_COMPILE or even just re-using TRITON_DEBUG to decide whether to enable debug mode in pass manager?

I think we do want to separate these debugging modes. For example, it doesn't make sense if the user only want lineinfo while the compiler also adds device assert, if debugging modes are mixed.

Jokeren avatar Feb 14 '24 23:02 Jokeren

We also noticed this issue on our internal version of Triton proper, running a newer LLVM version. test_core.py crashes after trying to emit too many warnings about tt.divisibility being unhandeled (coming from here: https://github.com/llvm/llvm-project/commit/fa6850a9981b65972294e13021f82b96d460b3ec#diff-c255e7b32bf400dde3e5bc44ba9aaf2e8789f02a70cf080de1877f925accd45bR117). This will become an issue here as well, once we want to update to a newer LLVM version.

gflegar avatar Feb 15 '24 10:02 gflegar

We also noticed this issue on our internal version of Triton proper, running a newer LLVM version. test_core.py crashes after trying to emit too many warnings about tt.divisibility being unhandeled (coming from here: https://github.com/llvm/llvm-project/commit/fa6850a9981b65972294e13021f82b96d460b3ec#diff-c255e7b32bf400dde3e5bc44ba9aaf2e8789f02a70cf080de1877f925accd45bR117). This will become an issue here as well, once we want to update to a newer LLVM version.

If only tt.divisibility attribute is a problem, can we implement LLVMTranslationDialectInterface and then just emit success() when encountering tt.divisibility?

Jokeren avatar Feb 15 '24 13:02 Jokeren

Alternatively, we can remove tt.divisibility during the conversion from Triton::FuncOp to LLVM::FuncOp. Let me know if any of the solutions makes sense :)

Jokeren avatar Feb 15 '24 13:02 Jokeren

The "fix" I used internally for now is https://github.com/openai/triton/commit/e3259831bfcbaef86143524bab1ce05bf5186387. Would this be something you're interested in upstreaming? This allows us to also speed up compile-times since we're not processing all of these warnings unless requested by the user, and can use multithreading.

But probably we should do both what you suggest, and do something with tt.divisibility to not emit it, and the commit I linked. Sergey also has a thread on Slack to discuss what the best thing would be to do with tt.divisibility here.

gflegar avatar Feb 16 '24 13:02 gflegar

The "fix" I used internally for now is https://github.com/openai/triton/commit/e3259831bfcbaef86143524bab1ce05bf5186387. Would this be something you're interested in upstreaming?

Seems good to me!

Jokeren avatar Feb 16 '24 14:02 Jokeren

The "fix" I used internally for now is e325983. Would this be something you're interested in upstreaming? This allows us to also speed up compile-times since we're not processing all of these warnings unless requested by the user, and can use multithreading.

But probably we should do both what you suggest, and do something with tt.divisibility to not emit it, and the commit I linked. Sergey also has a thread on Slack to discuss what the best thing would be to do with tt.divisibility here.

Is the main problem that the recent change https://github.com/openai/triton/commit/48034034c86510a64d99a6bbf8ebcf0f1487681c started disabling multithreading all the time instead of only when dumping? I'm guessing that's what is causing the performance problem, we should definitely fix it

ThomasRaoux avatar Feb 16 '24 17:02 ThomasRaoux

#3147 puts diagnostics behind an environment variable

gflegar avatar Feb 19 '24 10:02 gflegar

@gflegar's PR should solve the problem. I'm closing the PR, feel free to reopen it if you think something is still needed.

ThomasRaoux avatar Feb 19 '24 17:02 ThomasRaoux