LiveCodeBench
LiveCodeBench copied to clipboard
Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"
Please add a new DeepSeek-V2 model to your great benchmark: https://huggingface.co/deepseek-ai/DeepSeek-V2
There is new players in the LLM's coding arena :) https://huggingface.co/ibm-granite/granite-34b-code-instruct These models show very impressive results. Please add 34b, 20b, 8b and 3b versions to your great benchmark.
Hello, Thank you for your hard work. I tried to run the code bench locally (on a RTX 3060 12Gb) but was hitting issues, I know however though that it...
Hello, the data set of livecodebench is Python, would you consider supporting multi-language data set evaluation? Especially Java. thanks.
Before evaluation, `export ONE_SHOT=1; export BACKTICKS=1". Also, I commented out the following lines to avoid exceptions: ``` + # enforce_eager=True, + # enable_prefix_caching=True, ```
This pr has the prompt changes for the new Dracarys family models released.
https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
https://github.com/LiveCodeBench/LiveCodeBench/blob/45015dd2a9fa4bf445613e2f29da505dc0ca5c03/lcb_runner/prompts/code_generation.py#L56 It looks like cllama's prompt always has the following appended: `f"### ANSWER (use the provided delimiters, read the inputs from stdin and write response to stdout)\n\n"` This seems improper,...
https://huggingface.co/mistralai/mamba-codestral-7B-v0.1
``` >>> lcb_codegen = load_dataset("livecodebench/code_generation_lite", version_tag="release_v2") Traceback (most recent call last): File "", line 1, in File "load.py", line 2587, in load_dataset ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File builder.py",...