bigcodebench issues

Seeding / Reproducibility

* It would be nice to have seeding. If we are using this package for research, it is important that our results are reproducible. * Currently, there is no way...

TerryTong-Git

🤗 [REQUEST] - Qwen3-Next series

### Model introduction The new mixed precision model is claimed to be similar to the other Qwen3 models ### Model URL https://huggingface.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d ### Additional instructions (Optional) _No response_ ### Author...

BradKML

Gradio does not work properly

1

Loaded as API: https://bigcode-bigcodebench-evaluator.hf.space/ ✔ Traceback (most recent call last): File "", line 1, in File "/home/ma-user/anaconda3/envs/evalplus_env/lib/python3.9/site-packages/gradio_client/client.py", line 171, in __init__ self._info = self._get_api_info() File "/home/ma-user/anaconda3/envs/evalplus_env/lib/python3.9/site-packages/gradio_client/client.py", line 566, in _get_api_info...

p81sunshine

🐛 [TaskRemoval/TaskRepair] - <120> <Unexpected test cases>

### BigCodeBench version 1.0.4 ### Output of running `ls ~/.cache/bigcodebench` BigCodeBench-Hard-v0.1.4.jsonl ### Task ID of the programming task BigCodeBench/120 ### The original complete prompt ```python For the test case: def...

bajinsheng

🤗 [REQUEST] - GPT-OSS 20B and 120B

### Model introduction OpenAI finally released the promised model, and yet they are more or less optimized for instruction following, not writing code. It would be good if the model...

BradKML

where is the file: audio.wav?

BigCodeBench/227 where is the file: audio.wav?

xinghang5029

🤗 [REQUEST] - Kimi K2 + Kimi-Dev

### Model introduction This is a 1T model that rivals the flagship models of OpenAI and Anthropic in terms of the standard benchmarks. BigCodeBench, however, is not present. ### Model...

BradKML

Is it normal for the ground truth accuracy not to be 100%?

Under this setting, my evaluation results on qwen2.5coder-instruct-3b is betther than results claimed from the officical techinique report.

p81sunshine

🤗 [REQUEST] - Arsh-llm-0.7b

2

### Model introduction The model is created by Arshia Afshani and used to generate texts on a structured way. ### Model URL https://huggingface.co/arshiaafshani/Arsh-llm-0.7b ### Additional instructions (Optional) _No response_ ###...

arsh-team

TypeError: pass_k int not iterable in evaluate.py

2

Hi, Here is the command I am using: `bigcodebench.evaluate --execution local --split complete --subset full --samples /scratch3/workspace/wenlongzhao_umass_edu-reason/dev_kedar/Small-LLM-Reasoning/scratch/notebooks/output.jsonl --pass_k "1"` I was trying to locally evaluate the results, and I used...

KedarnathKC

bigcodebench
bigcodebench copied to clipboard

Metadata

Seeding / Reproducibility

🤗 [REQUEST] - Qwen3-Next series

Gradio does not work properly

🐛 [TaskRemoval/TaskRepair] - <120> <Unexpected test cases>

🤗 [REQUEST] - GPT-OSS 20B and 120B

where is the file: audio.wav?

🤗 [REQUEST] - Kimi K2 + Kimi-Dev

Is it normal for the ground truth accuracy not to be 100%?

🤗 [REQUEST] - Arsh-llm-0.7b

TypeError: pass_k int not iterable in evaluate.py

← Metadata

Owner

Metadata

bigcodebench bigcodebench copied to clipboard

Metadata

← Metadata

Owner

Metadata

bigcodebench
bigcodebench copied to clipboard