lightning-thunder [Feature request] Optional debugging option to get trace with information on tensor strides along with tensor shapes

🚀 Feature Request

Currently we have computation traces with the generated tensor shapes as part of comments next to the computation like

t908 = torch.nn.functional.linear(t907, t19, t17)  # t908: "cuda:0 bf16[1, 2048, 4096]"
    # t908 = ltorch.linear(t907, t19, t17)  # t908: "cuda:0 bf16[1, 2048, 4096]"
      # t908 = prims.linear(t907, t19, t17)  # t908: "cuda:0 bf16[1, 2048, 4096]"

However, there are some situations where stride information becomes necessary for debugging like #583 where a stride difference was creating an illegal memory access in one of the executors. While my understanding is that Thunder consciously has made a decision to not include stride information in order to let backends manage strides on their own and not limit constructed traces to stride requirements. This feature request does not require changing that.

Given a set of fixed input tensors, can there be a way of generating computation traces with the tensor shapes and strides? There can be a mandatory requirement that such a trace can only be generated after one full iteration has executed so that strides can be recorded or such a trace can only be generated until a failed execution?

cc @carmocca @apaz-cli

Jun 18 '24 04:06 parthmannan

Hi @parthmannan , thank you for filing this!

Let me pick your brain a bit: Maybe we could have an advanced debugging tutorial where we define a symbol with an impl printing tensor information (such as stride) and a transformation that inserts calling that symbol for every TensorProxy result. Would that help you?

Jun 18 '24 05:06 t-vi

That sounds pretty useful and should suffice the requirement. Would calling this transformation generate a full computation trace for every TensorProxy result with the required tensor information?

And I am guessing this is far more useful than just having a trace with strides as one can define a symbol that prints any attribute of a tensor like requires_grad etc.?

(Side comment) Re: tutorial - While a tutorial would be awesome to start but it does require users to modify model execution code. The easiest debugging method is just enabling debug logs using env variables like TORCH_COMPILE_DEBUG or CUBLASLT_LOG_LEVEL etc. This doesn't need to be an option for something niche like strides but in general, our debugging is a little more complex with users required to call functions to grab traces, print them out etc.

Jun 18 '24 05:06 parthmannan

triage review:

let's provide a notebook showing how to debug issues like this
after we discover a pattern we like, providing an extensibility point could be useful. One example extensibility point might be a callback that gets invoked on each symbol with (fn, *args, **kwargs), so it could print the strides of all input tensors, for example. We could even consider the callback being able to add comments to the trace to note what happened

Jun 24 '24 19:06 mruberry

One example extensibility point might be a callback that gets invoked on each symbol with (fn, *args, **kwargs), so it could print the strides of all input tensors, for example. We could even consider the callback being able to add comments to the trace to note what happened

I have something like this at #783, would be great to have some feedback. @parthmannan I think it will provide a simpler way to query the strides (and/or other data regarding the runtime arguments).

Jul 16 '24 15:07 kshitij12345