code-eval icon indicating copy to clipboard operation
code-eval copied to clipboard

Run evaluation on LLMs using human-eval benchmark

Results 5 code-eval issues
Sort by recently updated
recently updated
newest added

I'm keeping https://github.com/ErikBjare/are-copilots-local-yet up-to-date, and would love to see some codellama numbers given it's now SOTA :)

![image](https://github.com/abacaj/code-eval/assets/8592144/4ffe64fe-1377-419d-a0b8-64193b6597b6)

I got only 9.7% for llama2-7B-chat on human-eval using your script ``` python {'pass@1': 0.0975609756097561} ```

建议支持 https://github.com/THUDM/CodeGeeX2 刚刚发布,根据公布1@pass数据达到了35.9