opencompass icon indicating copy to clipboard operation
opencompass copied to clipboard

[Feat] Support Knowledge-based Retriever

Open zhangyikaii opened this issue 10 months ago • 2 comments

Thank you very much for your contributions to the community. The open-compass/opencompass project is truly outstanding, and I envision engaging in further research based on opencompass foundation.

In this Pull Request, I introduce a new KnowledgeRetriever that builds upon the LangChain [Code] to incorporate a knowledge base into the LLM evaluation. This feature achieves the following:

  • Prior to answering, LLM can now provide relevant information from the knowledge base (implemented by vector store) for each test example. Users can specify which knowledge base files to use through the knowledge_docs parameter in infer_cfg.
  • While answering, LLM has the capability to retrieve information based on specific keys in the question. For instance, it can match only content related to the options of the question. This is achieved by specifying the retrieve_keys in infer_cfg.
  • The above functionalities are implemented by introducing only one file, opencompass/openicl/icl_retriever/icl_knowledge_retriever.py, adding just a new icl_retriever. The logic for configs and method of this PR corresponds to the existing framework, with clear comments and consistent coding style.

This PR introduces the LangChain option to the Retrievers, which significantly alleviates the phenomenon of hallucinations in LLM's performance on some test questions. Furthermore, it enables the evaluation of the LLM's ability to summarize existing relevant knowledge.

In the configs/eval_demo_knowledge.py file, I provide an example configuration for the KnowledgeRetriever using the FewCLUE_chid dataset (about choosing the correct idiom according to the context). The knowledge base can be found here: Knowledge Base Link (Extraction Code: 0g25, please put it in ./data/). The final implementation results are as follows:

"origin_prompt": " 以下是参考内容:【严陈以待】见“严阵以待”。 【严阵以待】亦作“严陈以待”。 谓以严整的阵势,等待着敌人进犯,予以打击。; 借指改朝换代。 多指改朝换代。 【改朝换代】旧的朝代为新的朝代所代替。; 【不同戴天】同“不共戴天”。 【不共戴天】谓不共存于人世间。; 【海北天南】形容距离很远。 【天南海北】①形容距离遥远的不同地区。; 【物归原主】把东西归还原来的主人。; 后用“波谲云诡”以喻文章如波云变化多致。 【云谲波诡】谓像云气和水波那样千态万状,变化无穷。 【波谲云诡】①汉扬雄《甘泉赋》:“於是大厦云谲波诡,摧摧而成观。”; 【视远步高】高视阔步。 【高步阔视】同“高视阔步”。 【高视阔步】形容气宇轩昂或态度傲慢。,结合上述参考内容,考虑接下来的问题: 这意味着,在不久的将来,HJT异质结电池或将迎来爆发,光伏电池或也将迎来从PERC到HJT______的历史性投资机遇期。 自3月18日以来,HJT龙头迈为股份已经大涨58.46%,捷佳纬创已经大涨22.53%。 01 什么是HJT电池? HJT,中文名称异质结电... 请选择______处所填的词 A. 严阵以待 B. 改朝换代 C. 不共戴天 D. 天南海北 E. 物归原主 F. 波谲云诡 G. 高视阔步 请从“A”,“B”,“C”,“D”,“E”,“F”,“G”中进行选择。答: "

where contents between “以下是参考内容:” and “结合上述参考内容,考虑接下来的问题:” is the content of the that is retrieved from the knowledge base.

I'm looking forward to your feedback, and if there are any issues with the code, I'm committed to making further improvements. Thank you!

zhangyikaii avatar Sep 03 '23 11:09 zhangyikaii