bigcodebench
bigcodebench copied to clipboard
BigCodeBench: Benchmarking Code Generation Towards AGI
Hi, How would I add a custom prompt? with {{question}} or something to add code in between. I want to test this prompt ```markdown You are an expert AI programming...
**Rationale** It's been proven to be a fraud. Their CEO has repeatedly lied and not fully come clean about: 1. Uploading weights identical to Llama 3 2. Faking API responses...
This PR adds `200 OK` status code for the mocks in task 211 and 215. #### TODO - [ ] Verify model passes task with new behavior - [ ]...
This document includes the features of BigCodeBench Q3 2024. Please feel free to discuss and contribute, as this roadmap is shaped by the BigCodeBench community. ### Help Wanted - [x]...
### EvalPlus version v0_1_0_hf ### Output of running `ls ~/.cache/bigcodebench` BigCodeBench-v0.1.0_hf.jsonl ### Task ID of the programming task BigCodeBench/211, BigCodeBench/215, probably some others as well ### The original test ```python...
The readme mentions `...Or if you want to try it locally regardless of the risks ⚠️:`. Explain what the risks of running locally are, and how they are mitigated by...
It can be tricky to understand exactly how to use the different binaries to run the benchmark end-to-end. I propose adding an example that the user can follow. Bonus points...
Hi, In the report some GPT-4 annotated categories of problems were shown. Would it be possible to share these categories? Thanks!
### Model introduction This is a general-purpose 7B LLM. It has a 1M token context window, and is the best-performing sub-12B model on Open LLM Leaderboard, so it might be...
### Model introduction This is a family of four models, where two of the models have been trained to generate 4 tokens per forward pass instead of only a single...