Songyang Zhang comments

Results 223 comments of


                                            Songyang Zhang

[Feature] Add GPQA benchmark?

Thanks for the insightful suggestions. We will add this into our backlog. Contribution is also welcomed.

> [https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/multimodal_eval.html中的多模态评测使用的是opencompass中的python](https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/multimodal_eval.html%E4%B8%AD%E7%9A%84%E5%A4%9A%E6%A8%A1%E6%80%81%E8%AF%84%E6%B5%8B%E4%BD%BF%E7%94%A8%E7%9A%84%E6%98%AFopencompass%E4%B8%AD%E7%9A%84python) run.py configs/multimodal/tasks.py --mm-eval，这部分支持测试吗？目前测试报错，榜单中提到使用的是VLMEvalKit Please try VLMEvalKit, evaluation for VLM has been deprecated in opencompass repo

Songyang Zhang

[Feature] Add GPQA benchmark?

[Feature] 多模态榜单

[Bug] Different result on mmlu between opencompass and lm-evaluation-harness

UnboundLocalError: local variable 'prompt_token_num' referenced before assignment and NO OUTPUTS

[Bug] Failed to reproduce llama2-70b-base on triviaqa

[Feat] Support Knowledge-based Retriever

[Bug] Failed to reproduce llama2-70b-base on triviaqa

[Feature] 建议增加Embedding模型评测

[Feature] 建议增加Embedding模型评测

Integrate pytorch poc python api