bigcodebench issues

[Feature Request] Custom Prompt

3

Hi, How would I add a custom prompt? with {{question}} or something to add code in between. I want to test this prompt ```markdown You are an expert AI programming...

s-smits

[REQUEST] Remove Reflection-Llama-3.1-70B

**Rationale** It's been proven to be a fraud. Their CEO has repeatedly lied and not fully come clean about: 1. Uploading weights identical to Llama 3 2. Faking API responses...

wakamex

Add status code to mock response for BigCodeBench tasks 211 and 215

1

This PR adds `200 OK` status code for the mocks in task 211 and 215. #### TODO - [ ] Verify model passes task with new behavior - [ ]...

hvaara

[Roadmap] BigCodeBench Q3 2024 Roadmap

This document includes the features of BigCodeBench Q3 2024. Please feel free to discuss and contribute, as this roadmap is shaped by the BigCodeBench community. ### Help Wanted - [x]...

terryyz

🐛 [TestRemoval/TestRepair] - 211, 215- include status code in mock response

3

### EvalPlus version v0_1_0_hf ### Output of running `ls ~/.cache/bigcodebench` BigCodeBench-v0.1.0_hf.jsonl ### Task ID of the programming task BigCodeBench/211, BigCodeBench/215, probably some others as well ### The original test ```python...

dmelcer9

bug

[docs] Clarify risks of running evals locally.

The readme mentions `...Or if you want to try it locally regardless of the risks ⚠️:`. Explain what the risks of running locally are, and how they are mitigated by...

hvaara

documentation

[docs] Add working end-to-end example for a local model

1

It can be tricky to understand exactly how to use the different binaries to run the benchmark end-to-end. I propose adding an example that the user can follow. Bonus points...

hvaara

documentation

Problem Categories

4

Hi, In the report some GPT-4 annotated categories of problems were shown. Would it be possible to share these categories? Thanks!

normster

question

🤗 [REQUEST] - InternLM2.5 7B

1

### Model introduction This is a general-purpose 7B LLM. It has a 1M token context window, and is the best-performing sub-12B model on Open LLM Leaderboard, so it might be...

ethanc8

🤗 [REQUEST] - multi-token-prediction

### Model introduction This is a family of four models, where two of the models have been trained to generate 4 tokens per forward pass instead of only a single...

ethanc8

bigcodebench
bigcodebench copied to clipboard

Metadata

[Feature Request] Custom Prompt

[REQUEST] Remove Reflection-Llama-3.1-70B

Add status code to mock response for BigCodeBench tasks 211 and 215

[Roadmap] BigCodeBench Q3 2024 Roadmap

🐛 [TestRemoval/TestRepair] - 211, 215- include status code in mock response

[docs] Clarify risks of running evals locally.

[docs] Add working end-to-end example for a local model

Problem Categories

🤗 [REQUEST] - InternLM2.5 7B

🤗 [REQUEST] - multi-token-prediction

← Metadata

Owner

Metadata

bigcodebench bigcodebench copied to clipboard

Metadata

← Metadata

Owner

Metadata

bigcodebench
bigcodebench copied to clipboard