Enable NvtxProfileTransform by default
🚀 Feature
Add NvtxProfileTransform to the default set of transforms.
Motivation
NvtxProfileTransform adds nvtx around each thunder op, which is great for understanding what thunder is spending its time doing.
Pitch
Anyone that wants to look at performance wants this transform on. If you're not looking at perf, the overhead should be negligible.
Alternatives
- detect if we're running in the profiler (is this possible?) and turn it on automagically
- automagically enable it when the
THUNDER_ANNOTATE_TRACESenvironment variable is set, which has precedence as doing something similar - enable it but automagically disable it when an environment variable is set
Additional context
It was pointed out that seeing a bunch of 'nvtx' nodes in a trace might be annoying for people developing their own transforms--it may feel like noise that obscures what's happening in the trace.
So I'm still not 100% sure about the motivation: What is the harm in the status quo that this will be fixing?
Currently, the task would be to have thunder.jit(..., transforms=(NvtxProfileTransform, ...)) and we're trying to save the passing of NvtxProfileTransform?
We do want the (power) user to specify transforms they want.
I must admit that it sets of my "too much magic" alert. In the end, the NvtxProfileTransform is useful for performance analysis, but does nothing for "casual use".
That said, I'd be all for adding it to all the benchmarking runs or other places where we believe that for the time being the main use will be profiling.
I'm not quite convinced because to me it sounds like an addition for developers while we could do the same thing with a few lines IIUC.
I agree with both, the trace is chatty enough right now and I don't see a big hurdle in adding the transform manually.
What @csarofeen was suggesting (activating if we detect profiling env vars) may be a way forward if this is felt like a strong need.
Yeah, I hear you. Certainly adding something to the trace can't make it simpler ;-)
To answer the 'why' question: the point was about not requiring users to learn about the transforms arg / delve into our documentation: in a large stack of software someone needs to understand what thunder is, search for it, find docs and then modify what might be a library call to thunder.jit. If libraries (not just ours) would just insert ranges by default things would be a lot easier.
But I see I'm in the minority here :-). I can just manually add it.
Are people more comfortable with enabling it by default in the hidden thunder.jit that happens inside the dynamo frontend? That interface is aimed at people who are not developers / wouldn't have any reason to look at a thunder trace anyway, so presumably the 'noise' added here wouldn't be such a big deal.
the point was about not requiring users to learn about the
transformsarg / delve into our documentation
as we have a page about examine, my initial impression is like why not having another about the transform in question.
Are people more comfortable with enabling it by default in the hidden thunder.jit that happens inside the dynamo frontend?
I would be so.
A few quick questions in my mind.
- would it be possible to give context managers to a
BoundSymbol? - If this transform is enabled by default, how would it interact with nvfuser executor giving nvtx ranges to each fusion region by default https://github.com/Lightning-AI/lightning-thunder/blob/e65dae053137afee9cb5b81b43d264a59128e491/thunder/executors/nvfuserex_impl.py#L448?
Are people more comfortable with enabling it by default in the hidden thunder.jit that happens inside the dynamo frontend?
I think this would fit well with the philosophy of thunder.jit staying "thin" and providing flexibility and building more elaborate interfaces on top.