opencompass icon indicating copy to clipboard operation
opencompass copied to clipboard

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Results 261 opencompass issues
Sort by recently updated
recently updated
newest added

### Describe the feature BigCode (Hugging Face and ServiceNow Research) released a new large-scale benchmark, BigCodeBench, for code generation with diverse function calls and complex instructions, covering 1140 expert-annotated tasks....

## Motivation When I tested the IFEval dataset locally, I found that this error would occur during the evaluation. Traceback (most recent call last): File "C:\Users\12072\.projects\PyCharmProjects\evaluation\executor\opencompass\opencompass\tasks\openicl_eval.py", line 364, in inferencer.run()...

### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...

### Describe the feature Support use the following pattern: ```bash OPENAI_API_KEY=xxxx BASE_URL=xxx opencompass xxxxxxxxx ``` ### Will you implement it? - [ ] I would like to implement this feature...

### Describe the feature Provide specific prompt version for common datasets in README ### Will you implement it? - [X] I would like to implement this feature and create a...

### Describe the feature Give examples on how to conduct evaluation with multi-gpus ### Will you implement it? - [ ] I would like to implement this feature and create...

### Describe the feature Introudce SGLang and Ollama as the inference backend ### Will you implement it? - [ ] I would like to implement this feature and create a...

### Describe the feature 1. Improve the evaluation efficiency for SciCode 2. Reduce the cost for h5 data loading ### Will you implement it? - [ ] I would like...

### Describe the feature - Use chat model and gen mode for primary example in Doc - Can use provide the CLI command for these two functions(list dataset, list model)?...

### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...