Songyang Zhang comments

Results 223 comments of


                                            Songyang Zhang

[Feature] Add a new evaluation dataset

models is a list of dict. You can evalute multiple models with one config

[Feature] 代码相关数据集评测保留更详细的评测信息

Thanks for the feature request, we will add this feature into our backlog of Q4. PR are also welcomed! Thanks again.

[Feature] 代码相关数据集评测保留更详细的评测信息

> Have you ever tested the performance of APIs such as GPT on the human eval dataset, and how did you test it? Please check our documentation for more details.

[Feature] 代码相关数据集评测保留更详细的评测信息

You can use `--dump-eval-details` currently.https://github.com/open-compass/opencompass/blob/001e77fea236276aa8018b34cd23076145ab1672/run.py#L127 Feel free to re-open if needed.

[Feature] Add a new evaluation dataset

@White-Friday Please check Flores. Feel free to re-open if needed.

[Bug] The program frequently stops running during execution

Thanks. 1. The error message indicates that there exists internet connection issue ``` huggingface_hub.utils._errors.LocalEntryNotFoundError: Connection error, and we cannot find the requested files in the cached path. Please try again...

[Bug]

It appears that the inference is functioning correctly, so could you please provide a more detailed description of the bug you are encountering?

[Bug] The program frequently stops running during execution

Thanks. Would you like to provide an example config, we can try the config to re-implement this issue.

[Feature] I try to use instruction tuning to fine-tune llama model and get a new alpaca model. How to use compass to evaluate the local alpaca model on MMLU and other datasets

LLama format or huggingface format?

[Bug] ceval, cmmlu, mmlu 的 gen 对话模板行为不一致，mmlu 的对话模板存在问题

Thanks for the reporting. Please try the prompt template ``` mmlu_gen_23a9a9 mmlu_gen_79e572 ``` We will investigate the influence of the mentioned problem.