matteochen

Results 5 comments of matteochen

Is this still an open discussion, or do we have a preferred approach? Given #2437, maybe unifying everything under option 1 is clearer in terms of UX?

nvFuser segmentation behaviour could increase the difficulty of a possible `update_aliases` within fusion regions. For example, the test function: ```python def f(a, b): result = a.exp_() * b.tanh_() a.add_(1) return...

Just to clarify, do you want to remove the bound symbol if the output proxy is not used, even though a sub symbol has the `DONT_DCE` tag?

The CUDA Graph static memory model should cause this behaviour. Apart from the static inputs (parameters etc...) we allocate static buffers for graph replay: https://github.com/Lightning-AI/lightning-thunder/blob/ab067fdfbe11fa83b779d7b09ad8c8c1ccf24dc0/thunder/transforms/cudagraph.py#L93 Profiling a sample model, we...

> Excellent profiling work and attention to details, great deep dive! > > Just one small clarification: the conclusion that the copied input to backward is the output of forward...