lightning-thunder Add the information about the set of auto-registered torch operators in CompileStats

Before submitting

[ ] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
[ ] Did you read the contributor guideline, Pull Request section?
[ ] Did you make sure to update the docs?
[ ] Did you write any new necessary tests?

What does this PR do?

As a follow up to #811, this PR adds another record auto_registered_torch_operators in CompileStats to track this information, so that the user can query the set of automatically registered torch operators with thunder.compile_stats(jitted_fn).auto_registered_torch_operators

Requirement: Fix the bug about torch.special operators found along the way (https://github.com/Lightning-AI/lightning-thunder/pull/979)

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Aug 16 '24 10:08 kiya00

Thank you for the PR @kiya00 ! I can see how allowing the user to query for this is good, but I'm a bit hesitant about this approach. Could the same not be achieved by inspecting the last traces? If so, I would prefer not collect another set of info.

Aug 16 '24 10:08 t-vi

Hi @t-vi , in the forward trace the operator is noted the same as thunder.torch symbol or torch operator(execuction trace) when auto-registered, but in the backward trace the auto-registered operator will have special symbol noted as opname_vjp. Besides in the thunder.examine, it gives information like:

Found 3 distinct operations, of which 3 (100.0%) are supported
Note 2 operators are automatically registered:
special_gammaln of torch._C._special
special_erf of torch._C._special
The function appears to be working as expected

So it can't tell exactly from the traces (but examining can give the info), do you think it's necessary to add it in CompileStats?

Aug 16 '24 12:08 kiya00

I'm not quite understanding this. To my mind, a hacky way to get the auto symbols in the trace would be {bsym.sym for bsym in lt.bound_symbols if bsym.sym.meta.__code__ in thunder.torch.meta_adaptor.__code__.co_consts} for the forward. And if we wanted, we could keep a set of all auto-registered symbols and compare {bsym.sym in thunder.torch._auto_registered_symbols}. But maybe I'm missing something?

Aug 16 '24 13:08 t-vi

If we just look at the text format of the traces it's hard to tell, but when we get to the symbol.meta level we can always see the difference in whether the op is auto-registered, and also all the ops are listed in the thunder/torch/default_torch_ops.py. Keeping the auto-registered ops in the CompileStats is more like for non-advanced users to get a direct look at what's going on. This depends on what kind of UX we want, what do you think @IvanYashchuk ?

Aug 16 '24 13:08 kiya00

Thank you for the PR @kiya00 ! I can see how allowing the user to query for this is good, but I'm a bit hesitant about this approach. Could the same not be achieved by inspecting the last traces? If so, I would prefer not collect another set of info.

If we think that eagerly collecting statistics is burdensome, then we could add some architecture to support the lazy collection of statistics. This lazy collection could work by reviewing the traces associated with a statistics object. That sounds like a fine idea, but also an architectural change that is probably for another PR.

The cost of this particular collection seems very small (the call to check if an operator is auto-registered, and then the addition to a set).

Aug 16 '24 15:08 mruberry

If we think that eagerly collecting statistics is burdensome, then we could add some architecture to support the lazy collection of statistics. This lazy collection could work by reviewing the traces associated with a statistics object. That sounds like a fine idea, but also an architectural change that is probably for another PR.

I have a strong preference for not adding collection of statistics that we can obtain without collecting. I'm totally cool with providing users a convenient way access this, but I do think we should do this by having a function that grabs the traces and looks at them.

The cost of this particular collection seems very small (the call to check if an operator is auto-registered, and then the addition to a set).

It also adds a context var lookup to every function call during tracing. We already have a dozen probably, but I think it is one part of what's slow (there are certainly more and many of them I have put in).

Aug 16 '24 18:08 t-vi

If we think that eagerly collecting statistics is burdensome, then we could add some architecture to support the lazy collection of statistics. This lazy collection could work by reviewing the traces associated with a statistics object. That sounds like a fine idea, but also an architectural change that is probably for another PR.

I have a strong preference for not adding collection of statistics that we can obtain without collecting. I'm totally cool with providing users a convenient way access this, but I do think we should do this by having a function that grabs the traces and looks at them.

We can probably do that without too much of an issue. @kiya00, what would you think of lazily populating this statistic by inspecting the first trace?

The cost of this particular collection seems very small (the call to check if an operator is auto-registered, and then the addition to a set).

It also adds a context var lookup to every function call during tracing. We already have a dozen probably, but I think it is one part of what's slow (there are certainly more and many of them I have put in).

Would you file an issue for this? I wouldn't worry about the performance before we measure it.

Aug 16 '24 19:08 mruberry

We can probably do that without too much of an issue. @kiya00, what would you think of lazily populating this statistic by inspecting the first trace?

Hi @t-vi @mruberry! Sure, I can instead create a function like def get_auto_registered_torch_ops(fn) -> set[str] | None to retrieve the set of auto-registered torch ops based on the symbols in the trace as mentioned by @t-vi

Aug 20 '24 09:08 kiya00