llama
llama copied to clipboard
Missing backward method in transformer block
Thank you for the open source release of the code. I have noticed that the transformer block class definition is missing the manually implemented backward function mentioned in the paper. It would be great if this function was added.
A short sample of training code addressing how to best make use of the optimization would also surely be valuable to many people trying to reproduce the results.
For reference, the part of the paper addressing the manually implemented backward function: