bigcodebench icon indicating copy to clipboard operation
bigcodebench copied to clipboard

BigCodeBench: Benchmarking Code Generation Towards AGI

Results 31 bigcodebench issues
Sort by recently updated
recently updated
newest added

* It would be nice to have seeding. If we are using this package for research, it is important that our results are reproducible. * Currently, there is no way...

### Model introduction The new mixed precision model is claimed to be similar to the other Qwen3 models ### Model URL https://huggingface.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d ### Additional instructions (Optional) _No response_ ### Author...

Loaded as API: https://bigcode-bigcodebench-evaluator.hf.space/ ✔ Traceback (most recent call last): File "", line 1, in File "/home/ma-user/anaconda3/envs/evalplus_env/lib/python3.9/site-packages/gradio_client/client.py", line 171, in __init__ self._info = self._get_api_info() File "/home/ma-user/anaconda3/envs/evalplus_env/lib/python3.9/site-packages/gradio_client/client.py", line 566, in _get_api_info...

### BigCodeBench version 1.0.4 ### Output of running `ls ~/.cache/bigcodebench` BigCodeBench-Hard-v0.1.4.jsonl ### Task ID of the programming task BigCodeBench/120 ### The original complete prompt ```python For the test case: def...

### Model introduction OpenAI finally released the promised model, and yet they are more or less optimized for instruction following, not writing code. It would be good if the model...

BigCodeBench/227 where is the file: audio.wav?

### Model introduction This is a 1T model that rivals the flagship models of OpenAI and Anthropic in terms of the standard benchmarks. BigCodeBench, however, is not present. ### Model...

Under this setting, my evaluation results on qwen2.5coder-instruct-3b is betther than results claimed from the officical techinique report.

### Model introduction The model is created by Arshia Afshani and used to generate texts on a structured way. ### Model URL https://huggingface.co/arshiaafshani/Arsh-llm-0.7b ### Additional instructions (Optional) _No response_ ###...

Hi, Here is the command I am using: `bigcodebench.evaluate --execution local --split complete --subset full --samples /scratch3/workspace/wenlongzhao_umass_edu-reason/dev_kedar/Small-LLM-Reasoning/scratch/notebooks/output.jsonl --pass_k "1"` I was trying to locally evaluate the results, and I used...