ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Feature Request]: Q&A knowledge base upgrade to FAQ knowledge base.

Open xxxl123 opened this issue 1 year ago • 3 comments

Is there an existing issue for the same feature request?

  • [X] I have checked the existing issues.

Is your feature request related to a problem?

No response

Describe the feature you'd like

主要针对问答系统,尤其是客服或者其他有固定回复的问答系统。这里是一篇文章,可以作为阅参考:https://zhuanlan.zhihu.com/p/50799128 此项目已经有了QA问答的模板,就是一个问题一个回答的那种模板,但是在日常生活中,对于一个问题的问法是多样的,对于这一类问题都是同一个结果。相对于QA问答,我们也可以通过写多个QA来进行这样的目的(我自己是这样做的,不知道好不好),如下

{"question": "你这个包安装吗", "answer": "老板,我们家是给您到货免费预约师傅上门安装,给您免费拆旧换新的,师傅上门是不会收取任何费的,无任何隐形消费,您可以放心下单!!(全程0费用的哦)"} {"question": "安装费用", "answer": "老板,我们家是给您到货免费预约师傅上门安装,给您免费拆旧换新的,师傅上门是不会收取任何费的,无任何隐形消费,您可以放心下单!!(全程0费用的哦)"} {"question": "这个包安装吗", "answer": "老板,我们家是给您到货免费预约师傅上门安装,给您免费拆旧换新的,师傅上门是不会收取任何费的,无任何隐形消费,您可以放心下单!!(全程0费用的哦)"} {"question": "包安装吗", "answer": "老板,我们家是给您到货免费预约师傅上门安装,给您免费拆旧换新的,师傅上门是不会收取任何费的,无任何隐形消费,您可以放心下单!!(全程0费用的哦)"}

但是不久前了解到了FAQBot客服机器人这个项目,这个技术18年开始出现的,好多大厂客服机器人就是这个逻辑,发现他们有些想法非常不错。用户的问题虽然形式和说法各有不一,千奇百怪,但是其整体分布上是一个稳定的状态。而回答却会随着业务的发展和国家政策的更改不断发生变化。即QA数据本身,A就会发生变化,而Q却不会发生太大变化。而且问题和问题的语义空间是一致的,而问题与回答的语义空间可能是不一致的。这时候把相同语言的放在一起嵌入向量数据库会更加的好,他们的模板是这样的

{ 'question': '你这个包安装吗', 'similar_question':['安装费用', '这个包安装吗', '包安装吗'] 'answer': "老板,我们家是给您到货免费预约师傅上门安装,给您免费拆旧换新的,师傅上门是不会收取任何费的,无任何隐形消费,您可以放心下单!!(全程0费用的哦)" }

Describe implementation you've considered

No response

Documentation, adoption, use case

No response

Additional information

No response

xxxl123 avatar Apr 28 '24 09:04 xxxl123

I think RAGFlow's Q&A already meets this requirement well. The reason we use RAG is that LLM can identify similar questions rather than we need to enumerate all the question to an answer. RAGFlow will support Q&A file format like DOCX, PDF and MD for more complex Q&A. Currently, only TAB splited CSV file and EXCEL are supported.

KevinHuSh avatar Apr 28 '24 09:04 KevinHuSh

I think RAGFlow's Q&A already meets this requirement well. The reason we use RAG is that LLM can identify similar questions rather than we need to enumerate all the question to an answer. RAGFlow will support Q&A file format like DOCX, PDF and MD for more complex Q&A. Currently, only TAB splited CSV file and EXCEL are supported.

首先谢谢您的回复,这里我是有点不明白的,根据RAG的原理,可以简单的分为检索召回和生成,LLM模型只有在生成的时候起到了作用,在检索召回的时候还是基于向量数据库,这里确实相似的语义可能发布在一片空间,所以也可以认为这个有语义识别。请问这里检索召回这里哪里使用到了LLM的能力,不是应该先找对问题,然后载通过找到的,结合LLM模型来生成吗?

xxxl123 avatar Apr 29 '24 01:04 xxxl123

Correct. Search and retrieval don't involve with LLM.

KevinHuSh avatar May 04 '24 23:05 KevinHuSh