Langchain-Chatchat
Langchain-Chatchat copied to clipboard
[BUG] 用fastchat加载vicuna-13b模型进行知识库的问答有token的限制错误
当我开启fastchat的vicuna-13b的api服务,然后config那里配置好(api本地测试过可以返回结果),然后知识库加载好之后(知识库大概有1000多个文档,用chatGLM可以正常推理),进行问答时出现token超过限制,就问了一句hello;
错误号如下:openai.error.APIError: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2359 tokens (1847 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400)
错误的文件定位: File "/mnt/f/chatGPT/langchain-ChatGLM/chains/local_doc_qa.py", line 303, in get_knowledge_based_answer for answer_result in self.llm.generatorAnswer(prompt=prompt, history=chat_history, File "/mnt/f/chatGPT/langchain-ChatGLM/models/fastchat_openai_llm.py", line 109, in generatorAnswer completion = openai.ChatCompletion.create( File
麻烦看一下!
Hello, I found the bug which comes from the part of FastChat. The problem comes from the functionget_gen_params
in fastchat.serve.openai_api_server.py
. In this function, it will try to append your message with a default message, which is already too long (see fastchat.conversation.py
line 200 as an example). So even if you try to use 'hello', there is this bug.
Sorry i am writing it in English because that there is no chinese keyboard in this system...
同样问题 老哥双卡能加载vicuna13吗?最终问题如何解决?
同样的问题,已经解决了吗?
同样问题 老哥双卡能加载vicuna13吗?最终问题如何解决?
我这边 单卡 能跑,双卡显然也能跑
A lazy way to solve this is to add a line in fastchat.serve.openai_api_server.py line 233 with conv["messages"] = []
after conv = await get_conv(model_name)
A lazy way to solve this is to add a line in fastchat.serve.openai_api_server.py line 233 with
conv["messages"] = []
afterconv = await get_conv(model_name)
好像不能解决该问题
openai.error.APIError: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2228 tokens (1716 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400)
fastchat端显示一个错误的请求 INFO: 127.0.0.1:33084 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
另外,如果使用的本地知识库是 程序自带的 samples,是可以work的。
那我也不太清楚了。我没直接用这个repo,只是用了他request到fastchat那部分的代码。 你可以看一下他最后一步输入的prompt到底是什么样子的,加没加奇怪的conversation。否则可能是你的embedding chunksize太大了。2048的token还是挺小的
当我开启fastchat的vicuna-13b的api服务,然后config那里配置好(api本地测试过可以返回结果),然后知识库加载好之后(知识库大概有1000多个文档,用chatGLM可以正常推理),进行问答时出现token超过限制,就问了一句hello;
错误号如下:openai.error.APIError: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2359 tokens (1847 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400)
错误的文件定位: File "/mnt/f/chatGPT/langchain-ChatGLM/chains/local_doc_qa.py", line 303, in get_knowledge_based_answer for answer_result in self.llm.generatorAnswer(prompt=prompt, history=chat_history, File "/mnt/f/chatGPT/langchain-ChatGLM/models/fastchat_openai_llm.py", line 109, in generatorAnswer completion = openai.ChatCompletion.create( File
麻烦看一下!
我也遇到了同样的问题,请问楼主有解决吗
当我开启fastchat的vicuna-13b的api服务,然后config那里配置好(api本地测试过可以返回结果),然后知识库加载好之后(知识库大概有1000多个文档,用chatGLM可以正常推理),进行问答时出现token超过限制,就问了一句hello; 错误号如下:openai.error.APIError: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2359 tokens (1847 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400) 错误的文件定位: File "/mnt/f/chatGPT/langchain-ChatGLM/chains/local_doc_qa.py", line 303, in get_knowledge_based_answer for answer_result in self.llm.generatorAnswer(prompt=prompt, history=chat_history, File "/mnt/f/chatGPT/langchain-ChatGLM/models/fastchat_openai_llm.py", line 109, in generatorAnswer completion = openai.ChatCompletion.create( File 麻烦看一下!
我也遇到了同样的问题,请问楼主有解决吗
降低VECTOR_SEARCH_TOP_K值
当我开启fastchat的vicuna-13b的api服务,然后config那里配置好(api本地测试过可以返回结果),然后知识库加载好之后(知识库大概有1000多个文档,用chatGLM可以正常推理),进行问答时出现token超过限制,就问了一句hello; 错误号如下:openai.error.APIError: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2359 tokens (1847 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400) 错误的文件定位: File "/mnt/f/chatGPT/langchain-ChatGLM/chains/local_doc_qa.py", line 303, in get_knowledge_based_answer for answer_result in self.llm.generatorAnswer(prompt=prompt, history=chat_history, File "/mnt/f/chatGPT/langchain-ChatGLM/models/fastchat_openai_llm.py", line 109, in generatorAnswer completion = openai.ChatCompletion.create( File 麻烦看一下!
我也遇到了同样的问题,请问楼主有解决吗
降低VECTOR_SEARCH_TOP_K值
经过验证,这个办法可以解决此问题 model_config.py 中VECTOR_SEARCH_TOP_K值 默认为5,验证时 修改为 1,错误没有发生,但具体可以增大到多少,没有尝试。