opencompass icon indicating copy to clipboard operation
opencompass copied to clipboard

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Results 261 opencompass issues
Sort by recently updated
recently updated
newest added

### Describe the feature Including all the evaluation results shown on the webpage? ### Will you implement it? - [ ] I would like to implement this feature and create...

### Describe the feature 之前多模态榜单上好像有好几个bench一起评测(类似LLM榜单上好几个数据集综合评测),现在只有MMBench了。请问其他数据集评测是迁移到哪里了吗?有没有可能恢复回来呀? ### Will you implement it? - [ ] I would like to implement this feature and create a PR!

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand...

### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...

### Describe the feature BackLog, add user document for concurrent api usage ### Will you implement it? - [X] I would like to implement this feature and create a PR!

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand...

### 先决条件 - [X] 我已经搜索过 [问题](https://github.com/open-compass/opencompass/issues/) 和 [讨论](https://github.com/open-compass/opencompass/discussions) 但未得到预期的帮助。 - [X] 错误在 [最新版本](https://github.com/open-compass/opencompass) 中尚未被修复。 ### 问题类型 我正在使用官方支持的任务/模型/数据集进行评估。 ### 环境 {'CUDA available': True, 'CUDA_HOME': '/usr/local/cuda', 'GCC': 'gcc (Debian 10.2.1-6) 10.2.1 20210110',...

### 描述该功能 对于HumanEval这样的代码相关的数据集,评测时保存哪些错了哪些对了,以及具体的执行报错信息(traceback或者至少error message,能指向具体的代码行最好),这样方便查找问题,以及排除后处理等的影响、区分一些trivial的错误(如缺少import)等。如果能针对错误类型再做一个类似报表的统计,那就更好了。 ### 是否希望自己实现该功能? - [ ] 我希望自己来实现这一功能,并向 OpenCompass 贡献代码!

### 先决条件 - [X] 我已经搜索过 [问题](https://github.com/open-compass/opencompass/issues/) 和 [讨论](https://github.com/open-compass/opencompass/discussions) 但未得到预期的帮助。 - [X] 错误在 [最新版本](https://github.com/open-compass/opencompass) 中尚未被修复。 ### 问题类型 我正在使用官方支持的任务/模型/数据集进行评估。 ### 环境 - ### 重现问题 - 代码/配置示例 - ### 重现问题 - 命令或脚本 -...

### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...