opencompass
opencompass copied to clipboard
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
### Describe the feature There's a recent paper "GPQA: A Graduate-Level Google-Proof Q&A Benchmark" (https://arxiv.org/abs/2311.12022) which is useful for helping to devise ways for human experts to reliably get truthful...
### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...
### 先决条件 - [X] 我已经搜索过 [问题](https://github.com/open-compass/opencompass/issues/) 和 [讨论](https://github.com/open-compass/opencompass/discussions) 但未得到预期的帮助。 - [X] 错误在 [最新版本](https://github.com/open-compass/opencompass) 中尚未被修复。 ### 问题类型 我正在使用官方支持的任务/模型/数据集进行评估。 ### 环境 {'CUDA available': True, 'CUDA_HOME': '/usr/local/cuda', 'GCC': 'x86_64-linux-gnu-gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0', 'GPU...
### 先决条件 - [X] 我已经搜索过 [问题](https://github.com/open-compass/opencompass/issues/) 和 [讨论](https://github.com/open-compass/opencompass/discussions) 但未得到预期的帮助。 - [X] 错误在 [最新版本](https://github.com/open-compass/opencompass) 中尚未被修复。 ### 问题类型 我正在使用官方支持的任务/模型/数据集进行评估。 ### 环境 - ### 重现问题 - 代码/配置示例 - ### 重现问题 - 命令或脚本 -...
### Describe the feature Some community models have adopted the methods from the "Let’s Verify Step by Step" paper for training, thus there is a risk of Math data leakage....
### Describe the feature - https://hotpotqa.github.io/ - https://arxiv.org/pdf/1809.09600.pdf - https://arxiv.org/pdf/2304.10513.pdf ### Will you implement it? - [ ] I would like to implement this feature and create a PR!
### Prerequisite - [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the expected help. - [X] The bug has not been fixed in the [latest version](https://github.com/open-compass/opencompass). ### Type...
### Describe the feature As the base models (pre-trained models without sft) do not predict after the EOS token during the pre-train stage, setting the `add_special_tokens=True` by default is not...
### Describe the feature 目前是只支持GLUE的CoLA、MRPC和QQP吗,什么时候支持QNLI呢 ### Will you implement it? - [ ] I would like to implement this feature and create a PR!
### Describe the feature Evaluation of CasualLLM(OpenCompass Leaderboard) ### Will you implement it? - [ ] I would like to implement this feature and create a PR!