open_llama icon indicating copy to clipboard operation
open_llama copied to clipboard

Question on weights

Open siddhsql opened this issue 1 year ago • 1 comments

Hello - thanks for this library. Great work!

I am trying to understand how to interpret the Open LLAMA weights. As I understand LLAMA is 2 things:

  • the NN model which is nothing but a deep graph of artificial neurons describing their connectivity and activation functions
  • the weights connecting these neurons (weights of edges in the graph)

what the Open LLAMA project did was to take the original LLAMA paper, then develop their own implementation of the NN (#1 above) followed by training this NN on the red pajama dataset. Is this correct?

Why not train the NN on the same dataset that was used to train the original LLAMA? Then you would get the same weights as original LLAMA (theoretically speaking).

siddhsql avatar Jun 21 '23 15:06 siddhsql

Replicating the exact dataset based on the description in the LLaMA paper isn't possible unless you can replicate precisely what they've done in the paper. Check out section 2 in the paper for details: https://arxiv.org/pdf/2302.13971.pdf

snichols avatar Jun 21 '23 16:06 snichols

The original dataset of LLaMA was not released, so we cannot use that. We choose the RedPajama dataset which is a reproduction of the LLaMA dataset according to the details in the paper.

young-geng avatar Jul 07 '23 07:07 young-geng