relax icon indicating copy to clipboard operation
relax copied to clipboard

[USMP] Initial implementation of liveness analysis for Relax + TIR

Open gigiblender opened this issue 2 years ago • 6 comments

This PR adds an initial implementation of liveness analysis of tensors/buffers for Relax and TIR programs.

@areusch @mbaret @YuchenJin @mikepapadim

gigiblender avatar Sep 15 '22 12:09 gigiblender

Thanks @gigiblender for integrating USMP into Relax!

One idea about the liveness analysis pass: we can have a memory lifting pass which lifts the memory allocations in TIR into Relax first, and this will allow the liveness analysis pass to analyze only the Relax functions without the need to analyze the TIR Primfuncs in the IRModule. Would love to hear your thoughts. 😄

And one suggestion for the test case construction, we encourage developers to use the block_builder and emit_te api to construct the IRModule if the TVMScript is very long, for example: https://github.com/tlc-pack/relax/blob/relax/tests/python/relax/test_transform_fuse_ops.py#L51-L57. This will make the test case more concise.

YuchenJin avatar Sep 21 '22 18:09 YuchenJin

thanks @YuchenJin !

One idea about the liveness analysis pass: we can have a memory lifting pass which lifts the memory allocations in TIR into Relax first, and this will allow the liveness analysis pass to analyze only the Relax functions without the need to analyze the TIR Primfuncs in the IRModule. Would love to hear your thoughts. 😄

one challenge we have with lifting allocs is that if a TIR PrimFunc has two internal allocs which don't overlap, then we wouldn't be able to detect that solely by looking at Call(relax.builtin.alloc_tensor. However, I think that we might want to iterate on this PR to derive liveness based on first/last usage rather than just alloc nodes, so maybe this is less of a concern.

areusch avatar Sep 21 '22 20:09 areusch

one challenge we have with lifting allocs is that if a TIR PrimFunc has two internal allocs which don't overlap, then we wouldn't be able to detect that solely by looking at Call(relax.builtin.alloc_tensor.

Thanks @areusch! If we run the MetaSchedule tuning pass or other transformations/schedules first (which is usually the case since memory planning is at the later stage of the compilation), the temporary allocs inside TIR PrimFunc will get removed, so usually there will not be multiple temporary alloc in a TIR PrimFunc. Would love to know the cases where there are several temporary allocs.

YuchenJin avatar Sep 22 '22 01:09 YuchenJin

hm, i was thinking that you would see this case when doing multi-anchor fusion. I haven't explored that enough yet to know, though. it does seem like there isn't anything in TIR preventing this case from happening though, and if folks are writing custom TIR passes, it might not be sufficient to rely on MetaSchedule to reuse Buffers in TIR. with that said, this might not be as high of a priority if MetaSchedule does do this.

I'm not sure resolving this question changes the approach of modifying the LivenessAnalysis to generate alloc/kill events based on usage. However, it's certainly a good thing to understand further.

areusch avatar Sep 22 '22 16:09 areusch

hm, i was thinking that you would see this case when doing multi-anchor fusion. I haven't explored that enough yet to know, though. it does seem like there isn't anything in TIR preventing this case from happening though, and if folks are writing custom TIR passes, it might not be sufficient to rely on MetaSchedule to reuse Buffers in TIR. with that said, this might not be as high of a priority if MetaSchedule does do this.

I'm not sure resolving this question changes the approach of modifying the LivenessAnalysis to generate alloc/kill events based on usage. However, it's certainly a good thing to understand further.

Yes, I agree it does not change the general approach. My thought is if there are usually not multiple temporary allocs in a TIR PrimFunc, the liveness analysis pass would just need to traverse the Relax function after memory lifting, which would simplify the assumption and reduce the complexity of the liveness analysis pass by a lot. :)

YuchenJin avatar Sep 23 '22 18:09 YuchenJin

My thought is if there are usually not multiple temporary allocs in a TIR PrimFunc, the liveness analysis pass would just need to traverse the Relax function after memory lifting, which would simplify the assumption and reduce the complexity of the liveness analysis pass by a lot. :)

Ethos-U is a motivator for this functionality as it doesn't use metaschedule but does have multiple allocates in a single prim func. Doing buffer consolidation on a per-primfunc basis will also be generally less efficient than doing it with global knowledge where the memory fragmentation pattern is known.

mbaret avatar Oct 17 '22 15:10 mbaret