opencompass
opencompass copied to clipboard
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
### Describe the feature BigCode (Hugging Face and ServiceNow Research) released a new large-scale benchmark, BigCodeBench, for code generation with diverse function calls and complex instructions, covering 1140 expert-annotated tasks....
## Motivation When I tested the IFEval dataset locally, I found that this error would occur during the evaluation. Traceback (most recent call last): File "C:\Users\12072\.projects\PyCharmProjects\evaluation\executor\opencompass\opencompass\tasks\openicl_eval.py", line 364, in inferencer.run()...
### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...
### Describe the feature Support use the following pattern: ```bash OPENAI_API_KEY=xxxx BASE_URL=xxx opencompass xxxxxxxxx ``` ### Will you implement it? - [ ] I would like to implement this feature...
### Describe the feature Provide specific prompt version for common datasets in README ### Will you implement it? - [X] I would like to implement this feature and create a...
### Describe the feature Give examples on how to conduct evaluation with multi-gpus ### Will you implement it? - [ ] I would like to implement this feature and create...
### Describe the feature Introudce SGLang and Ollama as the inference backend ### Will you implement it? - [ ] I would like to implement this feature and create a...
### Describe the feature 1. Improve the evaluation efficiency for SciCode 2. Reduce the cost for h5 data loading ### Will you implement it? - [ ] I would like...
### Describe the feature - Use chat model and gen mode for primary example in Doc - Can use provide the CLI command for these two functions(list dataset, list model)?...
### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...