LLM-eval-survey
LLM-eval-survey copied to clipboard
Add Llama 2 as model evaluated?
Could you please be more specific? Where should we add this model?
In the paper, Llama is mentioned twice, both on page 6.
The first one is from a paper (Saparov et al., 2023), so just keep it.
The second one,
"Moreover, LLaMA-65B is the most robust open-source LLMs to date, which per- forms closely to code-davinci-002."
could be replaced by
"Moreover, LLAMA 2 70B is the most robust open-source LLMs to date, which performs very closely to GPT-3.5 and PaLM. But there is still a large gap in performance between LLAMA 2 70B and GPT-4 and PaLM-2-L.(Touvron et al., 2023)"
As code-davinci-002 is a code generation model derived from GPT-3, I think it is not appropriate to compare it with a pretrained model such as LLaMA. Just for your consideration.
Also, I'd suggest to add the following paper as reference.
Thanks for the detailed suggestion! We'll update the paper accordingly.