opencompass
opencompass copied to clipboard
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...
### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand...
### Describe the feature - https://huggingface.co/datasets/openai/MMMLU ### Will you implement it? - [ ] I would like to implement this feature and create a PR!
Pull Request: 多语言 MMMLU BenchMark评测实现 Motivation 在多语言环境下,已有的 MMLU 实现存在局限性。因此,我们希望通过本 PR 引入OpenAI的多语言评测集支持,观测模型在不同语言任务下的表现。目标是实现一个可以评测多种语言(如中文、法语、西班牙语等)的方法。 Modification 本 PR 修改了以下内容: 在数据集支持中增加多语言支持,包括语料的下载和预处理。 实现了多语言mmlu评测pipeline,使得模型能够在多种语言上进行评估。 更新了模型评估和基准测试,增加了多语言的评估指标。 BC-breaking (Optional) 此修改未引入向后不兼容的变化,所有旧的 API 和方法仍然可用,用户可以在新的多语言功能与原有功能之间自由切换。 Use cases (Optional) 本 PR 支持多语言能力,使得开发者可以在一个统一框架下评测多种语言的任务。 Checklist...
### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...
### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...
depends on PR #1198
### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...
### 先决条件 - [X] 我已经搜索过 [问题](https://github.com/open-compass/opencompass/issues/) 和 [讨论](https://github.com/open-compass/opencompass/discussions) 但未得到预期的帮助。 - [X] 错误在 [最新版本](https://github.com/open-compass/opencompass) 中尚未被修复。 ### 问题类型 我正在使用官方支持的任务/模型/数据集进行评估。 ### 环境 rt ### 重现问题 - 代码/配置示例 mtbench101目前只能一条一条跑,如何设置batchsize ### 重现问题 - 命令或脚本 rt...