lightning-thunder Enable NvtxProfileTransform by default

🚀 Feature

Add NvtxProfileTransform to the default set of transforms.

Motivation

NvtxProfileTransform adds nvtx around each thunder op, which is great for understanding what thunder is spending its time doing.

Pitch

Anyone that wants to look at performance wants this transform on. If you're not looking at perf, the overhead should be negligible.

Alternatives

detect if we're running in the profiler (is this possible?) and turn it on automagically
automagically enable it when the THUNDER_ANNOTATE_TRACES environment variable is set, which has precedence as doing something similar
enable it but automagically disable it when an environment variable is set

Additional context

It was pointed out that seeing a bunch of 'nvtx' nodes in a trace might be annoying for people developing their own transforms--it may feel like noise that obscures what's happening in the trace.

Sep 23 '24 22:09 tfogal

So I'm still not 100% sure about the motivation: What is the harm in the status quo that this will be fixing? Currently, the task would be to have thunder.jit(..., transforms=(NvtxProfileTransform, ...)) and we're trying to save the passing of NvtxProfileTransform? We do want the (power) user to specify transforms they want. I must admit that it sets of my "too much magic" alert. In the end, the NvtxProfileTransform is useful for performance analysis, but does nothing for "casual use". That said, I'd be all for adding it to all the benchmarking runs or other places where we believe that for the time being the main use will be profiling.

Sep 24 '24 07:09 t-vi

I'm not quite convinced because to me it sounds like an addition for developers while we could do the same thing with a few lines IIUC.

Sep 24 '24 08:09 crcrpar

I agree with both, the trace is chatty enough right now and I don't see a big hurdle in adding the transform manually.

What @csarofeen was suggesting (activating if we detect profiling env vars) may be a way forward if this is felt like a strong need.

Sep 24 '24 09:09 lantiga

Yeah, I hear you. Certainly adding something to the trace can't make it simpler ;-)

To answer the 'why' question: the point was about not requiring users to learn about the transforms arg / delve into our documentation: in a large stack of software someone needs to understand what thunder is, search for it, find docs and then modify what might be a library call to thunder.jit. If libraries (not just ours) would just insert ranges by default things would be a lot easier.

But I see I'm in the minority here :-). I can just manually add it.

Are people more comfortable with enabling it by default in the hidden thunder.jit that happens inside the dynamo frontend? That interface is aimed at people who are not developers / wouldn't have any reason to look at a thunder trace anyway, so presumably the 'noise' added here wouldn't be such a big deal.

Sep 25 '24 00:09 tfogal

the point was about not requiring users to learn about the transforms arg / delve into our documentation

as we have a page about examine, my initial impression is like why not having another about the transform in question.

Are people more comfortable with enabling it by default in the hidden thunder.jit that happens inside the dynamo frontend?

I would be so.

A few quick questions in my mind.

would it be possible to give context managers to a BoundSymbol?
If this transform is enabled by default, how would it interact with nvfuser executor giving nvtx ranges to each fusion region by default https://github.com/Lightning-AI/lightning-thunder/blob/e65dae053137afee9cb5b81b43d264a59128e491/thunder/executors/nvfuserex_impl.py#L448?

Sep 25 '24 00:09 crcrpar

Are people more comfortable with enabling it by default in the hidden thunder.jit that happens inside the dynamo frontend?

I think this would fit well with the philosophy of thunder.jit staying "thin" and providing flexibility and building more elaborate interfaces on top.

Sep 25 '24 06:09 t-vi