[BUG] 最新版本查询faiss库内数据很慢2万条数据需要10秒
上传以及检索时间跟0.2版本的差距有点大
我也发现了这个问题,我这边采用的是xinference(GLM-4/bge-large-zh) 进行大模型推理和embedding,11M的文件数据,chunk 都是250 + 50进行分割,在4090服务器上进行测试,0.2.10版本知识库可以快速回答,0.3.0版本需要20s以上的时间! 将chunk设置为500 + 50的分割方式,在0.3.0版本上仍然需要10s以上的时间,速度下降的很明显
这个需要排查一下,新的 retriever 里加入了 bm25 检索,带来了多少额外的开销。
遇到了同样的问题。其实数据库构建时间长一点无所谓,但是影响检索时间有些难以接受
确实慢了不少
bm25检索确实慢了不少,导致没法使用0.3版本,望大佬尽早排查修复下
我判断是 bm25的问题,因为写的很简单,它是将所有的文档从数据库加载到内存,在进行搜索,文件越多,传输开销越大,所有算法没问题,而是加载数据的方法有问题,我改用es数据库,读取数据很慢
比0.2版本慢很多+1
大佬,这个bug修复提上日程了吗? 迟迟不敢升级到0.3版本
同样的问题,本地知识库反应太慢了
bm25的锅,修改/chatchat-server/chatchat/server/file_rag/retrievers/ensemble.py中的from_vectorstore方法,暂时将bm25_retriever初始化的那段代码屏蔽掉,只用faiss_retriever可以明显提速
bm25的锅,/chatchat-server/chatchat/server/file_rag/retrievers/ensemble.py中的from_vectorstore方法修改,暂时将bm25_retriever初始化的那段代码发光掉,只用faiss_retriever可以明显提速
并没有很大的提升
bm25的锅,/chatchat-server/chatchat/server/file_rag/retrievers/ensemble.py中的from_vectorstore方法修改,暂时将bm25_retriever初始化的那段代码发光掉,只用faiss_retriever可以明显提速
并没有很大的提升
我刚试了 能达到0.2版本的效率
bm25的锅,/chatchat-server/chatchat/server/file_rag/retrievers/ensemble.py中的from_vectorstore方法修改,暂时将bm25_retriever初始化的那段代码发光掉,只用faiss_retriever可以明显提速
并没有很大的提升
LLM回答本来也要耗费一定的时间,只要知识库问答在10秒以内就应该是正常水平的
我在最新代码里面将BM25检索的部分注释掉,在Web界面RAG对话-知识库检索,阈值不管调到多少都检索不到答案可能是什么原因呢?有警告"/home/ubuntu/anaconda3/envs/langchain/lib/python3.10/site-packages/langchain_core/vectorstores.py:342: UserWarning: No relevant docs were retrieved using the relevance score threshold 0.98"
我替换成0.2的方法能快速检索,就是感觉召回率还是有点儿低。
bm25
我刚刚尝试了能够达到原0.2版本的效率
可能是因为我的数据量少,然后0.2版本的时候,知识库问答是在1s之内,现在慢了600ms左右
我在最新代码里面将BM25检索的部分注释掉,在Web界面RAG对话-知识库检索,阈值不管调到多少都检索不到答案可能是什么原因呢?有警告"/home/ubuntu/anaconda3/envs/langchain/lib/python3.10/site-packages/langchain_core/vectorstores.py:342: UserWarning: No relevant docs were retrieved using the relevance score threshold 0.98"
我替换成0.2的方法能快速检索,就是感觉召回率还是有点儿低。
注释掉这段代码对召回的结果肯定有一定的影响,这段代码的原意应该是综合了faiss_retriever和bm25_retriever,各占50%的比重,不过bm25_retriever的初始化有点简单粗暴,数据量稍微大一点性能就急剧下降,具体里面的东西我也没有深入去研究,注释掉bm25_retriever只是一个临时的解决方法,最终的解决方案还是等官方出新版本吧
我在最新代码里面将BM25检索的部分注释掉,在Web界面RAG对话-知识库检索,阈值不管调到多少都检索不到答案可能是什么原因呢?有警告"/home/ubuntu/anaconda3/envs/langchain/lib/python3.10/site-packages/langchain_core/vectorstores.py:342: UserWarning: No relevant docs were retrieved using the relevance score threshold 0.98"
我替换成0.2的方法能快速检索,就是感觉召回率还是有点儿低。
这是提示的没有这个阈值的数据 你稍微拉低点 0.5左右试试呢
这是提示的没有这个阈值的数据 你稍微拉低点 0.5左右试试呢
感谢解答,确实是阈值设置的高了,这个返回的score跟0.2的similarity_search_with_score_by_vector 是不一样的,我理解错了。
耗时是因为BM25_retriever在每一轮回答中都对知识库中的所有chunk做了一遍预处理,具体可参见: https://github.com/chatchat-space/Langchain-Chatchat/issues/4565#issuecomment-2268152564 。这里我提供一个简单的、经过验证的补丁方案,即服务启动前完成预处理操作并将结果固化为txt,之后在对话时只需读取该txt直接获得预处理结果即可,代码分为两块:
提前预处理
import os
import json
os.environ['CHATCHAT_ROOT'] = "***/libs/chatchat-server/chatchat"
import jieba
from tqdm import tqdm
from chatchat.server.knowledge_base.kb_service.base import KBServiceFactory
from chatchat.server.db.session import session_scope
preprocess_func=jieba.lcut_for_search
knowledge_base_name = "samples"
kb = KBServiceFactory.get_service_by_name(knowledge_base_name)
with kb.load_vector_store().acquire() as vs:
documents = list(vs.docstore._dict.values())
texts, metadatas = zip(*((d.page_content, d.metadata) for d in documents))
texts_processed = [preprocess_func(t) for t in tqdm(texts)]
with open('samples.txt', 'w', encoding='utf-8') as file:
json.dump(texts_processed , file)
读取
refer to: /python3.10/site-packages/langchain_community/retrievers/bm25.py: from_texts()
with open('**/libs/chatchat-server/chatchat/data/knowledge_base/samples/samples.txt', 'r', encoding='utf-8') as file:
texts_processed = json.load(file)
#texts_processed = [preprocess_func(t) for t in texts]
耗时是因为BM25_retriever在每一轮回答中都对知识库中的所有chunk做了一遍预处理,具体可参见: #4565 (comment) 。这里我提供一个简单的、经过验证的补丁方案,即服务启动前完成预处理操作并将结果固化为txt,之后在对话时只需读取该txt直接获得预处理结果即可,代码分为两块:
提前预处理
import os import json os.environ['CHATCHAT_ROOT'] = "***/libs/chatchat-server/chatchat" import jieba from tqdm import tqdm from chatchat.server.knowledge_base.kb_service.base import KBServiceFactory from chatchat.server.db.session import session_scope preprocess_func=jieba.lcut_for_search knowledge_base_name = "samples" kb = KBServiceFactory.get_service_by_name(knowledge_base_name) with kb.load_vector_store().acquire() as vs: documents = list(vs.docstore._dict.values()) texts, metadatas = zip(*((d.page_content, d.metadata) for d in documents)) texts_processed = [preprocess_func(t) for t in tqdm(texts)] with open('samples.txt', 'w', encoding='utf-8') as file: json.dump(texts_processed , file)读取
refer to: /python3.10/site-packages/langchain_community/retrievers/bm25.py: from_texts()
with open('**/libs/chatchat-server/chatchat/data/knowledge_base/samples/samples.txt', 'r', encoding='utf-8') as file: texts_processed = json.load(file) #texts_processed = [preprocess_func(t) for t in texts]
感觉应该有效果,现在测试的耗时是多少,是不是已经和注释掉bm25这一段代码时间差不多了
耗时是因为BM25_retriever在每一轮回答中都对知识库中的所有chunk做了一遍预处理,具体可参见: #4565 (comment) 。这里我提供一个简单的、经过验证的补丁方案,即服务启动前完成预处理操作并将结果固化为txt,之后在对话时只需读取该txt直接获得预处理结果即可,代码分为两块:
提前预处理
import os import json os.environ['CHATCHAT_ROOT'] = "***/libs/chatchat-server/chatchat" import jieba from tqdm import tqdm from chatchat.server.knowledge_base.kb_service.base import KBServiceFactory from chatchat.server.db.session import session_scope preprocess_func=jieba.lcut_for_search knowledge_base_name = "samples" kb = KBServiceFactory.get_service_by_name(knowledge_base_name) with kb.load_vector_store().acquire() as vs: documents = list(vs.docstore._dict.values()) texts, metadatas = zip(*((d.page_content, d.metadata) for d in documents)) texts_processed = [preprocess_func(t) for t in tqdm(texts)] with open('samples.txt', 'w', encoding='utf-8') as file: json.dump(texts_processed , file)读取
refer to: /python3.10/site-packages/langchain_community/retrievers/bm25.py: from_texts()
with open('**/libs/chatchat-server/chatchat/data/knowledge_base/samples/samples.txt', 'r', encoding='utf-8') as file: texts_processed = json.load(file) #texts_processed = [preprocess_func(t) for t in texts]感觉应该有效果,现在测试的耗时是多少,是不是已经和注释掉bm25这一段代码时间差不多了
yeap
现在 50W 数据量级别的知识库几乎不可用,得花 5 分钟来完成检索,删掉 bm25 后能够秒出结果。
有解决方案吗?我参考上述方案尝试修改BM25_retriever权重为0,没有效果提升,目前有几十G的文档数据,检索速度非常慢。
耗时是因为BM25_retriever在每一轮回答中都对知识库中的所有chunk做了一遍预处理,具体可参见: #4565 (comment) 。这里我提供一个简单的、经过验证的补丁方案,即服务启动前完成预处理操作并将结果固化为txt,之后在对话时只需读取该txt直接获得预处理结果即可,代码分为两块:
提前预处理
import os import json os.environ['CHATCHAT_ROOT'] = "***/libs/chatchat-server/chatchat" import jieba from tqdm import tqdm from chatchat.server.knowledge_base.kb_service.base import KBServiceFactory from chatchat.server.db.session import session_scope preprocess_func=jieba.lcut_for_search knowledge_base_name = "samples" kb = KBServiceFactory.get_service_by_name(knowledge_base_name) with kb.load_vector_store().acquire() as vs: documents = list(vs.docstore._dict.values()) texts, metadatas = zip(*((d.page_content, d.metadata) for d in documents)) texts_processed = [preprocess_func(t) for t in tqdm(texts)] with open('samples.txt', 'w', encoding='utf-8') as file: json.dump(texts_processed , file)读取
refer to: /python3.10/site-packages/langchain_community/retrievers/bm25.py: from_texts()
with open('**/libs/chatchat-server/chatchat/data/knowledge_base/samples/samples.txt', 'r', encoding='utf-8') as file: texts_processed = json.load(file) #texts_processed = [preprocess_func(t) for t in texts]
具体应该在哪个函数改呢
bm25的锅,/chatchat-server/chatchat/server/file_rag/retrievers/ensemble.py中的from_vectorstore方法修改,暂时将bm25_retriever初始化的那段代码发光掉,只用faiss_retriever可以明显提速
并没有很大的提升
我刚试了 能达到0.2版本的效率
我这边注释掉后召回非常差,比0.2.0版本差很多
耗时是因为BM25_retriever在每一轮回答中都对知识库中的所有chunk做了一遍预处理,具体可参见: #4565 (comment) 。这里我提供一个简单的、经过验证的补丁方案,即服务启动前完成预处理操作并将结果固化为txt,之后在对话时只需读取该txt直接获得预处理结果即可,代码分为两块:
提前预处理
import os import json os.environ['CHATCHAT_ROOT'] = "***/libs/chatchat-server/chatchat" import jieba from tqdm import tqdm from chatchat.server.knowledge_base.kb_service.base import KBServiceFactory from chatchat.server.db.session import session_scope preprocess_func=jieba.lcut_for_search knowledge_base_name = "samples" kb = KBServiceFactory.get_service_by_name(knowledge_base_name) with kb.load_vector_store().acquire() as vs: documents = list(vs.docstore._dict.values()) texts, metadatas = zip(*((d.page_content, d.metadata) for d in documents)) texts_processed = [preprocess_func(t) for t in tqdm(texts)] with open('samples.txt', 'w', encoding='utf-8') as file: json.dump(texts_processed , file)读取
refer to: /python3.10/site-packages/langchain_community/retrievers/bm25.py: from_texts()
with open('**/libs/chatchat-server/chatchat/data/knowledge_base/samples/samples.txt', 'r', encoding='utf-8') as file: texts_processed = json.load(file) #texts_processed = [preprocess_func(t) for t in texts]具体应该在哪个函数改呢
/python3.10/site-packages/langchain_community/retrievers/bm25.py: from_texts()
耗时是因为BM25_retriever在每一轮回答中都对知识库中的所有chunk做了一遍预处理,具体可参见: #4565 (comment) 。这里我提供一个简单的、经过验证的补丁方案,即服务启动前完成预处理操作并将结果固化为txt,之后在对话时只需读取该txt直接获得预处理结果即可,代码分为两块:
提前预处理
import os import json os.environ['CHATCHAT_ROOT'] = "***/libs/chatchat-server/chatchat" import jieba from tqdm import tqdm from chatchat.server.knowledge_base.kb_service.base import KBServiceFactory from chatchat.server.db.session import session_scope preprocess_func=jieba.lcut_for_search knowledge_base_name = "samples" kb = KBServiceFactory.get_service_by_name(knowledge_base_name) with kb.load_vector_store().acquire() as vs: documents = list(vs.docstore._dict.values()) texts, metadatas = zip(*((d.page_content, d.metadata) for d in documents)) texts_processed = [preprocess_func(t) for t in tqdm(texts)] with open('samples.txt', 'w', encoding='utf-8') as file: json.dump(texts_processed , file)读取
refer to: /python3.10/site-packages/langchain_community/retrievers/bm25.py: from_texts()
with open('**/libs/chatchat-server/chatchat/data/knowledge_base/samples/samples.txt', 'r', encoding='utf-8') as file: texts_processed = json.load(file) #texts_processed = [preprocess_func(t) for t in texts]具体应该在哪个函数改呢
/python3.10/site-packages/langchain_community/retrievers/bm25.py: from_texts()
“import os import json os.environ['CHATCHAT_ROOT'] = "***/libs/chatchat-server/chatchat" import jieba from tqdm import tqdm
from chatchat.server.knowledge_base.kb_service.base import KBServiceFactory from chatchat.server.db.session import session_scope
preprocess_func=jieba.lcut_for_search
knowledge_base_name = "samples"
kb = KBServiceFactory.get_service_by_name(knowledge_base_name)
with kb.load_vector_store().acquire() as vs:
documents = list(vs.docstore._dict.values())
texts, metadatas = zip(*((d.page_content, d.metadata) for d in documents))
texts_processed = [preprocess_func(t) for t in tqdm(texts)]
with open('samples.txt', 'w', encoding='utf-8') as file:
json.dump(texts_processed , file) ”这个也写在/python3.10/site-packages/langchain_community/retrievers/bm25.py里面吗
耗时是因为BM25_retriever在每一轮回答中都对知识库中的所有chunk做了一遍预处理,具体可参见: #4565 (comment) 。这里我提供一个简单的、经过验证的补丁方案,即服务启动前完成预处理操作并将结果固化为txt,之后在对话时只需读取该txt直接获得预处理结果即可,代码分为两块:
提前预处理
import os import json os.environ['CHATCHAT_ROOT'] = "***/libs/chatchat-server/chatchat" import jieba from tqdm import tqdm from chatchat.server.knowledge_base.kb_service.base import KBServiceFactory from chatchat.server.db.session import session_scope preprocess_func=jieba.lcut_for_search knowledge_base_name = "samples" kb = KBServiceFactory.get_service_by_name(knowledge_base_name) with kb.load_vector_store().acquire() as vs: documents = list(vs.docstore._dict.values()) texts, metadatas = zip(*((d.page_content, d.metadata) for d in documents)) texts_processed = [preprocess_func(t) for t in tqdm(texts)] with open('samples.txt', 'w', encoding='utf-8') as file: json.dump(texts_processed , file)读取
refer to: /python3.10/site-packages/langchain_community/retrievers/bm25.py: from_texts()
with open('**/libs/chatchat-server/chatchat/data/knowledge_base/samples/samples.txt', 'r', encoding='utf-8') as file: texts_processed = json.load(file) #texts_processed = [preprocess_func(t) for t in texts]具体应该在哪个函数改呢
/python3.10/site-packages/langchain_community/retrievers/bm25.py: from_texts()
“import os import json os.environ['CHATCHAT_ROOT'] = "***/libs/chatchat-server/chatchat" import jieba from tqdm import tqdm
from chatchat.server.knowledge_base.kb_service.base import KBServiceFactory from chatchat.server.db.session import session_scope
preprocess_func=jieba.lcut_for_search
knowledge_base_name = "samples" kb = KBServiceFactory.get_service_by_name(knowledge_base_name) with kb.load_vector_store().acquire() as vs: documents = list(vs.docstore._dict.values()) texts, metadatas = zip(*((d.page_content, d.metadata) for d in documents)) texts_processed = [preprocess_func(t) for t in tqdm(texts)] with open('samples.txt', 'w', encoding='utf-8') as file: json.dump(texts_processed , file) ”这个也写在/python3.10/site-packages/langchain_community/retrievers/bm25.py里面吗
1、"提前预处理"这步操作是单独放一个脚本里,跑完之后会将预处理的结果存到一个txt中;2、然后在我提到的那个函数中的对应位置进行"读取"操作,读取那个txt,同时注释掉那个位置上原来的代码,即 texts_processed = [preprocess_func(t) for t in texts]
@firrice 我执行预处理脚本的时候报错了 `2024-09-18 08:29:04.815 | WARNING | chatchat.server.utils:get_default_embedding:199 - default embedding model bge-m3 is not found in available embeddings, using bge-m3-1 instead 2024-09-18 08:29:04.892 | WARNING | chatchat.server.utils:get_default_embedding:199 - default embedding model bge-m3 is not found in available embeddings, using bge-m3-1 instead 2024-09-18 08:29:04.916 | WARNING | chatchat.server.utils:get_default_embedding:199 - default embedding model bge-m3 is not found in available embeddings, using bge-m3-1 instead Traceback (most recent call last): File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 146, in init self._dbapi_connection = engine.raw_connection() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3300, in raw_connection return self.pool.connect() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 449, in connect return _ConnectionFairy._checkout(self) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 1263, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 712, in checkout rec = pool._do_get() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 180, in _do_get self._dec_overflow() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 146, in exit raise exc_value.with_traceback(exc_tb) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 177, in _do_get return self._create_connection() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 390, in _create_connection return _ConnectionRecord(self) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 674, in init self.__connect() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 901, in __connect pool.logger.debug("Error on connect(): %s", e) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 146, in exit raise exc_value.with_traceback(exc_tb) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 896, in __connect self.dbapi_connection = connection = pool._invoke_creator(self) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/engine/create.py", line 643, in connect return dialect.connect(*cargs, **cparams) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 620, in connect return self.loaded_dbapi.connect(*cargs, **cparams) sqlite3.OperationalError: unable to open database file
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "preload.py", line 14, in
@firrice 我执行预处理脚本的时候报错了 `2024-09-18 08:29:04.815 | WARNING | chatchat.server.utils:get_default_embedding:199 - default embedding model bge-m3 is not found in available embeddings, using bge-m3-1 instead 2024-09-18 08:29:04.892 | WARNING | chatchat.server.utils:get_default_embedding:199 - default embedding model bge-m3 is not found in available embeddings, using bge-m3-1 instead 2024-09-18 08:29:04.916 | WARNING | chatchat.server.utils:get_default_embedding:199 - default embedding model bge-m3 is not found in available embeddings, using bge-m3-1 instead Traceback (most recent call last): File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 146, in init self._dbapi_connection = engine.raw_connection() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3300, in raw_connection return self.pool.connect() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 449, in connect return _ConnectionFairy._checkout(self) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 1263, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 712, in checkout rec = pool._do_get() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 180, in _do_get self._dec_overflow() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 146, in exit raise exc_value.with_traceback(exc_tb) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 177, in _do_get return self._create_connection() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 390, in _create_connection return _ConnectionRecord(self) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 674, in init self.__connect() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 901, in __connect pool.logger.debug("Error on connect(): %s", e) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 146, in exit raise exc_value.with_traceback(exc_tb) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 896, in __connect self.dbapi_connection = connection = pool._invoke_creator(self) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/engine/create.py", line 643, in connect return dialect.connect(*cargs, **cparams) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 620, in connect return self.loaded_dbapi.connect(*cargs, **cparams) sqlite3.OperationalError: unable to open database file
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "preload.py", line 14, in kb = KBServiceFactory.get_service_by_name(knowledge_base_name) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/chatchat/server/knowledge_base/kb_service/base.py", line 435, in get_service_by_name _, vs_type, embed_model = load_kb_from_db(kb_name) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/chatchat/server/db/session.py", line 28, in wrapper result = f(session, *args, **kwargs) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/chatchat/server/db/repository/knowledge_base_repository.py", line 53, in load_kb_from_db session.query(KnowledgeBaseModel) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2728, in first return self.limit(1)._iter().first() # type: ignore File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 2827, in _iter result: Union[ScalarResult[_T], Result[_T]] = self.session.execute( File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2351, in execute return self._execute_internal( File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2226, in _execute_internal conn = self._connection_for_bind(bind) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 2095, in _connection_for_bind return trans._connection_for_bind(engine, execution_options) File "", line 2, in _connection_for_bind File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/orm/state_changes.py", line 139, in _go ret_value = fn(self, *arg, **kw) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1189, in _connection_for_bind conn = bind.connect() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3276, in connect return self._connection_cls(self) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 148, in init Connection._handle_dbapi_exception_noconnection( File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2440, in _handle_dbapi_exception_noconnection raise sqlalchemy_exception.with_traceback(exc_info[2]) from e File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 146, in init self._dbapi_connection = engine.raw_connection() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 3300, in raw_connection return self.pool.connect() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 449, in connect return _ConnectionFairy._checkout(self) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 1263, in _checkout fairy = _ConnectionRecord.checkout(pool) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 712, in checkout rec = pool._do_get() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 180, in _do_get self._dec_overflow() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 146, in exit raise exc_value.with_traceback(exc_tb) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/impl.py", line 177, in _do_get return self._create_connection() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 390, in _create_connection return _ConnectionRecord(self) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 674, in init self.__connect() File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 901, in __connect pool.logger.debug("Error on connect(): %s", e) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 146, in exit raise exc_value.with_traceback(exc_tb) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/pool/base.py", line 896, in __connect self.dbapi_connection = connection = pool._invoke_creator(self) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/engine/create.py", line 643, in connect return dialect.connect(*cargs, **cparams) File "/root/miniconda3/envs/chatchat/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 620, in connect return self.loaded_dbapi.connect(*cargs, **cparams) sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) unable to open database file (Background on this error at: https://sqlalche.me/e/20/e3q8)`%60)
因为你没有连上sqlite数据库,大概率是路径问题(这个预处理脚本对应的是v0.3.0的文件目录架构)。你看一下具体知识库对应的info.db是否是这个路径: "***/libs/chatchat-server/chatchat/data/knowledge_base/info.db",还有注意os.environ[]那一行不要略掉
