open_llama Why is this better than llama in some instances?

Why is this better than llama in some instances?

Open rick2047 opened this issue 2 years ago • 1 comments

I was going through the readme and noticed here that this model is performing better than the 7B llama on many things, even though its trained on a fifth of the tokens (200B vs 1T). Does anyone understand how this happened?

May 04 '23 12:05 rick2047

Probably GIGO (Garbage In Garbage Out), the two models are trained on different datasets.

May 04 '23 19:05 ClaudeCoulombe

open_llama open_llama copied to clipboard

Why is this better than llama in some instances?

open_llama
open_llama copied to clipboard