opencompass
opencompass copied to clipboard
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
### Describe the feature 自定义的模型API都要配置步骤有哪些,除了写config文件外 ### Will you implement it? - [ ] I would like to implement this feature and create a PR!
### Describe the feature Huggingface : https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro?row=0 ### Will you implement it? - [ ] I would like to implement this feature and create a PR!
### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...
### Describe the feature triviaqa数据集的每条的question本身就有"?",triviaqa_gen_2121ce.py的prompt中在最后又加了一个"?"。请问该问号是不是可以去掉,还是加上这个问号性能会更好更稳定吗? ### Will you implement it? - [X] I would like to implement this feature and create a PR!
### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...
### 描述该功能 题目 ### 是否希望自己实现该功能? - [ ] 我希望自己来实现这一功能,并向 OpenCompass 贡献代码!
### 先决条件 - [X] 我已经搜索过 [问题](https://github.com/open-compass/opencompass/issues/) 和 [讨论](https://github.com/open-compass/opencompass/discussions) 但未得到预期的帮助。 - [X] 错误在 [最新版本](https://github.com/open-compass/opencompass) 中尚未被修复。 ### 问题类型 我正在使用官方支持的任务/模型/数据集进行评估。 ### 环境 {'CUDA available': True, 'CUDA_HOME': '/usr/local/cuda', 'GCC': 'gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0', 'GPU...
### 描述该功能 opencompass/opencompass/datasets/agieval/agieval.py 文件显示,目前AGIEval评测仅支持zero-shot设置: ``` def load(path: str, name: str, setting_name: str): from .dataset_loader import load_dataset, load_dataset_as_result_schema assert setting_name in 'zero-shot', 'only support zero-shot setting' dataset_wo_label = load_dataset(name, setting_name, path)...
### Describe the feature Hi, I found some typos in the efficient evaluation of the official document. In the following code snippet: https://github.com/open-compass/opencompass/blob/6c711cb262344b8819894a61f2791d5674e5cf73/docs/en/user_guides/evaluation.md?plain=1#L88-L100 line 97 `task=dict(type=OpenICLEvalTask)` should be `task=dict(type=OpenICLInferTask)`. And...