opencompass icon indicating copy to clipboard operation
opencompass copied to clipboard

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Results 431 opencompass issues
Sort by recently updated
recently updated
newest added

## Motivation 当我在使用mbpp测试Qwen3模型时,原有代码无法解决【BEGIN】```python这种情况下的process情况,故添加该处理patterns ## Modification 调整mbpp patterns ## BC-breaking (Optional) 不会 ## Use cases (Optional) 无 ## Checklist **Before PR**: - [ ] Pre-commit or other linting tools are used...

### Describe the feature 假设我有4个显卡,分别使用vllm部署了4个7b的不同的模型,我期望它可以并行请求这些模型。而不是请求完第一个的全部评测,才去请求第二个。 ### Will you implement it? - [ ] I would like to implement this feature and create a PR!

### 先决条件 - [x] 我已经搜索过 [问题](https://github.com/open-compass/opencompass/issues/) 和 [讨论](https://github.com/open-compass/opencompass/discussions) 但未得到预期的帮助。 - [x] 错误在 [最新版本](https://github.com/open-compass/opencompass) 中尚未被修复。 ### 问题类型 我正在使用官方支持的任务/模型/数据集进行评估。 ### 环境 python ### 重现问题 - 代码/配置示例 python ### 重现问题 - 命令或脚本 agieval_gen_617738.py...

### Prerequisite - [x] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [x] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...

[fix] Update README.md with a typo original: `to the field of factuality` Fixed: `to the field of factuality`

### Describe the feature Problem Description Currently, when OpenCompass performs large-scale model inference (infer), if a task is interrupted unexpectedly (e.g., due to resource failures, manual termination, etc.), it requires...

### Prerequisite - [x] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [x] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...

### Prerequisite - [x] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [x] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...

### 描述该功能 https://github.com/open-compass/opencompass/blob/5fd489994757e32bd64d8fbf3136bf71498c2a35/opencompass/configs/datasets/PHYBench/phybench_gen.py#L18 看起来 Remember 前面缺少个 空格 ### 是否希望自己实现该功能? - [ ] 我希望自己来实现这一功能,并向 OpenCompass 贡献代码!

### Describe the feature examples/eval_subjective.py 在这个文件中,我把judge_models改为了vllmwithchattemplate的形式,似乎并不能正常评测,alpaca eval的最终输出结果为空。 请问主观评测脚本支持用本地模型作为judge模型吗? ### Will you implement it? - [ ] I would like to implement this feature and create a PR!