open_llama
open_llama copied to clipboard
Question on weights
Hello - thanks for this library. Great work!
I am trying to understand how to interpret the Open LLAMA weights. As I understand LLAMA is 2 things:
- the NN model which is nothing but a deep graph of artificial neurons describing their connectivity and activation functions
- the weights connecting these neurons (weights of edges in the graph)
what the Open LLAMA project did was to take the original LLAMA paper, then develop their own implementation of the NN (#1 above) followed by training this NN on the red pajama dataset. Is this correct?
Why not train the NN on the same dataset that was used to train the original LLAMA? Then you would get the same weights as original LLAMA (theoretically speaking).
Replicating the exact dataset based on the description in the LLaMA paper isn't possible unless you can replicate precisely what they've done in the paper. Check out section 2 in the paper for details: https://arxiv.org/pdf/2302.13971.pdf
The original dataset of LLaMA was not released, so we cannot use that. We choose the RedPajama dataset which is a reproduction of the LLaMA dataset according to the details in the paper.