opencompass
opencompass copied to clipboard
[Feature] Support BigCodeBench
Describe the feature
BigCode (Hugging Face and ServiceNow Research) released a new large-scale benchmark, BigCodeBench, for code generation with diverse function calls and complex instructions, covering 1140 expert-annotated tasks. It has been officially used by DeepSeek and CodeGeeX4. BigCodeBench is considered a better alternative for HumanEval and other function-level code generation benchmarks (see here).
Will you implement it?
- [ ] I would like to implement this feature and create a PR!