code-eval icon indicating copy to clipboard operation
code-eval copied to clipboard

Any plans on running evals for codellama?

Open ErikBjare opened this issue 2 years ago • 2 comments

I'm keeping https://github.com/ErikBjare/are-copilots-local-yet up-to-date, and would love to see some codellama numbers given it's now SOTA :)

ErikBjare avatar Aug 25 '23 08:08 ErikBjare

I would be interested in this as well. I had some attempts on my own for the Python-7B and Instruct-7B models but if I use the same code of Llama-2 the performance is horrible (e.g., 3 and 8% respectively). As a comparison, with the same exact code, Llama-2-chat-7b gives me 11%.

nicoladainese96 avatar Sep 14 '23 12:09 nicoladainese96

I would be interested in this as well. I had some attempts on my own for the Python-7B and Instruct-7B models but if I use the same code of Llama-2 the performance is horrible (e.g., 3 and 8% respectively). As a comparison, with the same exact code, Llama-2-chat-7b gives me 11%.

I meet the same situation. Even if I try to use instructions in "core/prompts.py", the performance for codellama-7b is 22.8% for pass@1, still lower than the reported number in official document by a large margin. Have you fixed this problem?

smart-lty avatar Jan 09 '24 03:01 smart-lty