LiveCodeBench issues

Add DeepSeek-V2 to the benchmark

Please add a new DeepSeek-V2 model to your great benchmark: https://huggingface.co/deepseek-ai/DeepSeek-V2

model-request

Add IBM granite models to the benchmark

There is new players in the LLM's coding arena :) https://huggingface.co/ibm-granite/granite-34b-code-instruct These models show very impressive results. Please add 34b, 20b, 8b and 3b versions to your great benchmark.

rodion-m

model-request

HuggingFace Hub

7

Hello, Thank you for your hard work. I tried to run the code bench locally (on a RTX 3060 12Gb) but was hitting issues, I know however though that it...

WesleyTheGeolien

enhancement

good first issue

help wanted

Supports the evaluation of multilingual data sets

3

Hello, the data set of livecodebench is Python, would you consider supporting multi-language data set evaluation? Especially Java. thanks.

kartikzheng

enhancement

help wanted

Add StarCoder2-Instruct and support more tasks for existing models

Before evaluation, `export ONE_SHOT=1; export BACKTICKS=1". Also, I commented out the following lines to avoid exceptions: ``` + # enforce_eager=True, + # enable_prefix_caching=True, ```

UniverseFly

Dracarys style with code changes for all scenarios

This pr has the prompt changes for the new Dracarys family models released.

sreemanti-abacusai

Please add Mistral-Nemo model to the benchmark

https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407

rodion-m

model-request

Potential prompt bug: cllama question template unconditionally asks for model to use stdin/stdout, even if the test is functional

https://github.com/LiveCodeBench/LiveCodeBench/blob/45015dd2a9fa4bf445613e2f29da505dc0ca5c03/lcb_runner/prompts/code_generation.py#L56 It looks like cllama's prompt always has the following appended: `f"### ANSWER (use the provided delimiters, read the inputs from stdin and write response to stdout)\n\n"` This seems improper,...

alat-rights

Add a new mamba-codestral model to the benchmark

https://huggingface.co/mistralai/mamba-codestral-7B-v0.1

rodion-m

model-request

code_generation_lite loading is broken when split is not defined

``` >>> lcb_codegen = load_dataset("livecodebench/code_generation_lite", version_tag="release_v2") Traceback (most recent call last): File "", line 1, in File "load.py", line 2587, in load_dataset ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File builder.py",...

alat-rights

LiveCodeBench
LiveCodeBench copied to clipboard

Metadata

Add DeepSeek-V2 to the benchmark

Add IBM granite models to the benchmark

HuggingFace Hub

Supports the evaluation of multilingual data sets

Add StarCoder2-Instruct and support more tasks for existing models

Dracarys style with code changes for all scenarios

Please add Mistral-Nemo model to the benchmark

Potential prompt bug: cllama question template unconditionally asks for model to use stdin/stdout, even if the test is functional

Add a new mamba-codestral model to the benchmark

code_generation_lite loading is broken when split is not defined

← Metadata

Owner

Metadata

LiveCodeBench LiveCodeBench copied to clipboard

Metadata

← Metadata

Owner

Metadata

LiveCodeBench
LiveCodeBench copied to clipboard