lightning-thunder Add functional transformer demo

This notebook demonstrates the acceleration of a transformer model, implemented in a functional style, using Thunder. Key highlights:

Illustrates Thunder-compatible PyTorch code (it doesn't mean that more complicated code with object-oriented style cannot be handled)
Showcases successful execution of basic prompts using pre-trained weights
Provides a clear example of performance gains achieved through Thunder optimization (the only transformation involved here is the initial trace construction and "transform_for_execution")

The primary objective is to explain the characteristics of Thunder-friendly code and verify functionality with loaded pre-trained weights.

The code used in the notebook is adapted from https://gist.github.com/nreHieW/a4ae05d216c5326c9fb9a70fcdda3274

Aug 15 '24 09:08 kiya00

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Aug 15 '24 09:08 review-notebook-app[bot]

Hi @kiya00, thank you for writing thunder tutorials!

I would be very keen to not give the impression that users need to convert their code to functional in order to run it through thunder. Is there a particular reason for starting at a functional transformer here? I would venture that the same code with the computation in forward and weights as modules would work as well?

Aug 15 '24 09:08 t-vi

I would be very keen to not give the impression that users need to convert their code to functional in order to run it through thunder.

That is certainly not part of the plan. Do you have suggestions on what should be changed to avoid making this impression?

Is there a particular reason for starting at a functional transformer here?

It's the simplest form of PyTorch code apart from the imperative style without any functions.

I would venture that the same code with the computation in forward and weights as modules would work as well?

Of course, LitGPT is an example of that.

Aug 15 '24 10:08 IvanYashchuk

That is certainly not part of the plan. Do you have suggestions on what should be changed to avoid making this impression?

I think it's mainly a wording thing. The initial wording looked a bit like the functional part was the key to get it to run with thunder, e.g.

This will give us some insight into how to convert a PyTorch module into a simple "functional" Python function, allowing for seamless integration with Thunder.

seems quite odd to me.

At the other end of the spectrum would be something like "Usually, you can just apply thunder.jit to any PyTorch Module and this is the recommended way, but today we want to use thunder.jit with a transformer that is implemented as a function. Along the way find highlight a couple of things are not supported by thunder (yet?) and change them...."

The other part is that, jitting a module and grabbing the thunder.last_traces(tm) would also give you a fully functional transformer, and indeed there are cases when this is very useful.

Aug 16 '24 08:08 t-vi

Hi @t-vi @IvanYashchuk , I rephrased a bit, the main purpose of this notebook is to give an example of writing a simple functional python function for a pytorch module and thunder can also apply to this version. there's no implication that the function needs to be converted in any specific way to be compatible with Thunder. Sorry for the confusion of the initial draft. and I hope this revision more accurately conveys my intention, please help to take a look if I express what I meant

Aug 26 '24 14:08 kiya00

If we run this notebook in CI using the hugging face weights, the HF_TOKEN is needed and the weight is needed to download in Meta-Llama-3-8B/consolidated.00.pth under the same folder of the notebook

Aug 27 '24 08:08 kiya00

I think it would be OK to skip it in the CI. (We have not been running full models in it.)

Aug 27 '24 08:08 t-vi

If we run this notebook in CI using the hugging face weights, the HF_TOKEN is needed and the weight is needed to download in Meta-Llama-3-8B/consolidated.00.pth under the same folder of the notebook

Is there any other popular model that is not behind a registration wall?

Sep 02 '24 13:09 IvanYashchuk

@lantiga, @t-vi could you please review this new tutorial?

Sep 05 '24 13:09 IvanYashchuk

@lantiga, @t-vi could you please review this new tutorial?

Sep 11 '24 10:09 IvanYashchuk

The other question I'd have is if our use of the code is OK here (did we ask the gist author, do we think that the notebook is affected by the copyright of the gist)?

I've left a message to the author on the gist, hopefully we'll get some feedback soon.

Sep 25 '24 12:09 kiya00

Supergood!

Sep 26 '24 08:09 t-vi

In a v2 (absolutely not required in this PR), it might be interesting to compare functional version built here to the computation trace from jitting LitGPT.

Sep 26 '24 20:09 t-vi