bigcodebench
bigcodebench copied to clipboard
BigCodeBench: Benchmarking Code Generation Towards AGI
### Model introduction This model is created by arshiaafshani and is used for generating texts by a minimal hardware. It uses chat glm chat format. ### Model URL https://huggingface.co/arshiaafshani/Arsh-llm ###...
### Model introduction DeepSeek got a recent update, and also R2 is coming out soon, so it would be a good idea to keep tabs on both projects accordingly Addendum:...
### Model introduction base model ### Model URL https://huggingface.co/Qwen/Qwen2.5-Coder-7B ### Additional instructions (Optional) base model ### Author No ### Security - [x] I confirm that the model is safe to...
### Model introduction They published a new model for SWE-bench, but it feels like just using that have one benchmark is not enough https://mistral.ai/news/devstral P.S. Please also test SWE-Agent LM...
Add Qwen3.
Is there a standard for testing? Why do some test cases need to return None for exception detection and others need to throw an exception? Some need to check the...
I have an OpenAI compatible endpoint that I'm prepping up for evaluation, and I want to know what is the FINAL end request being sent? e.g: (I'm looking at the...
### Model introduction Hi I wonder how can we add new model such as thee newest discrete diffusion model https://huggingface.co/GSAI-ML/LLaDA-8B-Instruct the sampling might be a bit diff from llm rn...