taichi ability to access Taichi IR strings for specific kernels from within scripts

Concisely describe the proposed feature

It would be very convenient to be able to access the Taichi IR of specific kernels in a fine grained way. The only way to access the Taichi IR that I see documented in the Intermediate Representation (IR)section of the docs recommends using ti.init(print_ir=True), but notes that this prints the IR of all instantiated kernels.

To help debug an issue I was having recently, it would have been very convenient to be able to access to IR of just one specific kernel and write it to a file.

Describe the solution you'd like (if any) If there was an API that would allow me to access the IR as a string, I could write out the file myself, and I believe that having access to these strings would probably also be useful for other advanced debugging use cases.

Mar 15 '22 01:03 bcolloran

One possible solution: we introduce a "kernel decorator" system, something like one of the below

@ti.kernel(print_ir=True)
def foo():
    ...

# or 

@ti.kernel_with_config(print_ir=True)
def foo():
    ...

# or

@ti.kernel_config(print_ir=True)
@ti.kernel
def foo():
    ...

We can also put an option to serialize the whole kernel and even the optimization level for a specific level. I can imagine something going crazy like

@ti.kernel(print_ir=True, serialize=True, opt_level=2, debug=True, arch=ti.cpu)
def foo():
    ...

This will unlock a lot of fancy usages and benefit a lot of users.

What do you think? :-) @bcolloran @k-ye

Btw, @bcolloran, given you know Taichi sooo well (diving really deep into the IR system), would you be interested in joining us in developing this feature? We probably need a bit more design discussions to reach an API consensus, and then we will head for development :-) Thank you, and we look forward to your participation!

Mar 15 '22 01:03 yuanming-hu

I quite like the idea of being able to add options to the ti.kernel decorator @yuanming-hu. This does seem like a very extensible path to a lot of interesting use cases in the future!

As I think about it more, perhaps the one of the challenges with the current behavior of ti.init(print_ir=True) is that it just writes everything to stdout. You can dump all of that stdout output to a file, but it's not super flexible. Being able to specify print_ir=True at the kernel level would be a big improvement and much more flexible, but if it still writes the IR of that kernel to stdout, it might not be ideal for all cases (however, it would be a big incremental improvement if that is the easiest path forward!). But I guess I was thinking of something more like a function ti.ir_string(foo) that would enable me to get the string representation of each kernel's IR so that I can do whatever I want with it from within Python -- write the IR to a file, compare the IR of kernels as strings, or do some other operations.

@yuanming-hu, I would be thrilled to be able to contribute to Taichi, I think it's an incredible project and I love using it! But I cannot pretend to know Taichi better than I really do -- I have only dived pretty shallowly actually, just enough to want to diff the IR, but not enough to understand it. And I'm new to GPU compute in general (outside of Jax/TF/Torch, which hide all the details). But hopefully one day! :-)

Mar 15 '22 16:03 bcolloran

But I guess I was thinking of something more like a function ti.ir_string(foo) that would enable me to get the string representation of each kernel's IR so that I can do whatever I want with it from within Python -- write the IR to a file, compare the IR of kernels as strings, or do some other operations.

I agree that ti.ir_string would seem like an ideal API for it. One issue is that due to Taichi templates, the function may not be unique and the IR may need more information to be printed (e.g., template arguments).

@yuanming-hu, I would be thrilled to be able to contribute to Taichi, I think it's an incredible project and I love using it! But I cannot pretend to know Taichi better than I really do -- I have only dived pretty shallowly actually, just enough to want to diff the IR, but not enough to understand it. And I'm new to GPU compute in general (outside of Jax/TF/Torch, which hide all the details). But hopefully one day! :-)

Contributing is actually not so difficult :-) Perhaps the most challenging part is to set up a developer environment and to know what to contribute (which we already do in this thread).

We will probably have a bit more discussions here so that we know what the API would look like. The task may get broken down into small pieces and you are more than welcome to take a piece of this feature :-)

Mar 15 '22 16:03 yuanming-hu

Thanks, I also find it useful to have a per-kernel config. I favor the third option, because that's 1) the least intrusive change, and 2) aligned with loop_config.

@ti.kernel_config(print_ir=True, ...)
@ti.kernel
def foo():
    ...

In addition, we can start thinking about which configs are global vs per-kernel vs per-loop.

But I guess I was thinking of something more like a function ti.ir_string(foo) that would enable me to get the string representation of each kernel's IR so that I can do whatever I want with it from within Python -- write the IR to a file, compare the IR of kernels as strings, or do some other operations.

I feel like this is quite related to the AOT infrastructure. If we consider this problem in the LLVM-way, llvm::Module can be iterated over and printed. We can do similar stuff in AOT module as well?

Mar 17 '22 03:03 k-ye