lightning-thunder `requires_grad` attribute for cache key

🐛 Bug

The key making logic for Thunder's cache relies on requires_grad attribute of TensorProxy but it shouldn't:

https://github.com/Lightning-AI/lightning-thunder/blob/a8c063676cbecad7c196dcfe4fbeacf85775c9c5/thunder/common.py#L394-L400

Thunder does not propagate requires_grad information throughout the trace and therefore this results in many unnecessary cache misses.

cc @mruberry @lantiga @IvanYashchuk

Mar 25 '25 10:03 riccardofelluga

But how does the lack of propagation actually create indeterminism here?

Mar 31 '25 13:03 t-vi

What do you mean by indeterminism?

I found this problem in a very specific way but i think it's a good start to get the idea why I wrote the issue. I was debugging an executor and more specifically the creation of the joint forward and backward trace two torch.nn.functional.linear calls one after the other. I've noticed that even tho the symbols were the same, the inputs also the same, and cacheing was enabled, the executor received two calls for the grad transformation of the symbol.

The reason for this was a cache miss here: https://github.com/Lightning-AI/lightning-thunder/blob/2d18b7a07fb5e83d138707b7bf97bd2f82da4f2c/thunder/core/vjp_utils.py#L58-L61

In my opinion that should not happen because we do not track the requires_grad of a tensor throughout the trace lifetime.

Apr 01 '25 11:04 riccardofelluga

Another idea that I've been thinking could be to relegate the key creation to the executor, for example the StatefulExecutor would benefit from a custom key creation(say when doing checkpointing, the proxies have the same name and that should signal that the bound symbol is the same for the backward as the one encountered in the forward)

May 12 '25 12:05 riccardofelluga

So I think we'd be dropping the caching you link after #1956 , hopefully?

May 12 '25 12:05 t-vi