Triton-Puzzles [QST] Triton MLIR

@srush

Always appreciate your wonderful OSS educational contributions!

I'm relatively familiar with CUDA and triton but less so with machine learning compilers and am interested in getting into the weeds of triton's compilation pipeline.

I've come across a few resources for learning MLIR as well as related projects such as TVM (which has a comprehensive set of tutorials / learning materials spearheaded by Tianqi Chen of CMU), but have yet to bridge the gap from basic MLIR to something on the scale of triton.

The overarching motivation -- other than the fact that ML compilers are super-interesting :) -- is that in a world of increased demand for ML training / inference but limited GPU (NVIDIA) supply, the ability to write code that is backend-agnostic is evermore important.

A few questions:

Are you aware of any resources for learning MLIR incrementally, ideally building from basics to something like a toy triton, and more ambitiously, understanding enough of the triton backend to be able to contribute new optimization passes?
Is this something you're interested in and possibly collaborating on?

I'd be willing to do as much of the heavy lifting as needed:

I'd envision a step by step walkthrough of each of triton tutorials, starting with vec-add.
The goal would be to understand how each pass of the compilation pipeline translates high-level python to performant device code.
Something that pulls apart each component of the C++ MLIR pipeline and provides greater visibility -- and hackability -- than simply observing the output of MLIR_ENABLE_DUMP.

cc @Jokeren

Mar 20 '24 00:03 jeromeku

Hi Jerome,

Nice to hear from you. This seems like a very interesting project. It is a bit beyond my area of expertise as I have only dabbled a bit in the internals of MLIR generation and lowering. My only worry is that this area does seem like it is evolving quite quickly so there may not yet be stable enough foundations to document for lay-users.

@Jokeren might have more to add.

Mar 20 '24 01:03 srush

I doubt the Triton developers will find the time to craft documentation or develop tutorials.

However, there's a bunch of Chinese users out there who've been diving deep into Triton's code, breaking down every compiler pass in their blogs (in Chinese). Honestly, their enthusiasm has taken me by surprise... I've glanced over a few of these blogs and, honestly, they're top-notch.

Mar 20 '24 01:03 Jokeren

@Jokeren: Which blogs are you referring to? I'd certainly be interested in taking a gander!

@srush: no worries -- let me see how far I can take this on my own and will be happy to share any progress.

Mar 20 '24 01:03 jeromeku

Wait, do you read Chinese? I meant they are written in Chinese, or you plan to translate?

Mar 20 '24 01:03 Jokeren

Plan to translate -- wonders of multilingual llms (or google translate for that matter). I am also Chinese.

Mar 20 '24 02:03 jeromeku

That's great. Feel free to search with keyword "triton" on zhihu.com :)

That's what I'm referring to

Mar 20 '24 02:03 Jokeren

@Jokeren

Thanks -- yes, I've come across many of those already. Many deep dives into all things CUDA / Cutlass there as well, haha.

Are you planning on teaching a course on MLIR / deep learning compilers at your university?

Mar 20 '24 02:03 jeromeku

Are you planning on teaching a course on MLIR / deep learning compilers at your university?

Maybe not. They didn't assign me to a compiler course unfortunately.

Mar 20 '24 03:03 Jokeren

Triton-Puzzles Triton-Puzzles copied to clipboard

[QST] Triton MLIR

Triton-Puzzles
Triton-Puzzles copied to clipboard