Add functional transformer demo
This notebook demonstrates the acceleration of a transformer model, implemented in a functional style, using Thunder. Key highlights:
- Illustrates Thunder-compatible PyTorch code (it doesn't mean that more complicated code with object-oriented style cannot be handled)
- Showcases successful execution of basic prompts using pre-trained weights
- Provides a clear example of performance gains achieved through Thunder optimization (the only transformation involved here is the initial trace construction and "transform_for_execution")
The primary objective is to explain the characteristics of Thunder-friendly code and verify functionality with loaded pre-trained weights.
The code used in the notebook is adapted from https://gist.github.com/nreHieW/a4ae05d216c5326c9fb9a70fcdda3274
Check out this pull request on ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Hi @kiya00, thank you for writing thunder tutorials!
I would be very keen to not give the impression that users need to convert their code to functional in order to run it through thunder. Is there a particular reason for starting at a functional transformer here? I would venture that the same code with the computation in forward and weights as modules would work as well?
I would be very keen to not give the impression that users need to convert their code to functional in order to run it through thunder.
That is certainly not part of the plan. Do you have suggestions on what should be changed to avoid making this impression?
Is there a particular reason for starting at a functional transformer here?
It's the simplest form of PyTorch code apart from the imperative style without any functions.
I would venture that the same code with the computation in forward and weights as modules would work as well?
Of course, LitGPT is an example of that.
That is certainly not part of the plan. Do you have suggestions on what should be changed to avoid making this impression?
I think it's mainly a wording thing. The initial wording looked a bit like the functional part was the key to get it to run with thunder, e.g.
This will give us some insight into how to convert a PyTorch module into a simple "functional" Python function, allowing for seamless integration with Thunder.
seems quite odd to me.
At the other end of the spectrum would be something like "Usually, you can just apply thunder.jit to any PyTorch Module and this is the recommended way, but today we want to use thunder.jit with a transformer that is implemented as a function. Along the way find highlight a couple of things are not supported by thunder (yet?) and change them...."
The other part is that, jitting a module and grabbing the thunder.last_traces(tm) would also give you a fully functional transformer, and indeed there are cases when this is very useful.
Hi @t-vi @IvanYashchuk , I rephrased a bit, the main purpose of this notebook is to give an example of writing a simple functional python function for a pytorch module and thunder can also apply to this version. there's no implication that the function needs to be converted in any specific way to be compatible with Thunder. Sorry for the confusion of the initial draft. and I hope this revision more accurately conveys my intention, please help to take a look if I express what I meant
If we run this notebook in CI using the hugging face weights, the HF_TOKEN is needed and the weight is needed to download in Meta-Llama-3-8B/consolidated.00.pth under the same folder of the notebook
I think it would be OK to skip it in the CI. (We have not been running full models in it.)
If we run this notebook in CI using the hugging face weights, the HF_TOKEN is needed and the weight is needed to download in
Meta-Llama-3-8B/consolidated.00.pthunder the same folder of the notebook
Is there any other popular model that is not behind a registration wall?
@lantiga, @t-vi could you please review this new tutorial?
@lantiga, @t-vi could you please review this new tutorial?
The other question I'd have is if our use of the code is OK here (did we ask the gist author, do we think that the notebook is affected by the copyright of the gist)?
I've left a message to the author on the gist, hopefully we'll get some feedback soon.
Supergood!
In a v2 (absolutely not required in this PR), it might be interesting to compare functional version built here to the computation trace from jitting LitGPT.