Langchain-Chatchat [BUG] 用fastchat加载vicuna-13b模型进行知识库的问答有token的限制错误

当我开启fastchat的vicuna-13b的api服务，然后config那里配置好(api本地测试过可以返回结果)，然后知识库加载好之后(知识库大概有1000多个文档，用chatGLM可以正常推理)，进行问答时出现token超过限制，就问了一句hello；

错误号如下：openai.error.APIError: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2359 tokens (1847 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400)

错误的文件定位： File "/mnt/f/chatGPT/langchain-ChatGLM/chains/local_doc_qa.py", line 303, in get_knowledge_based_answer for answer_result in self.llm.generatorAnswer(prompt=prompt, history=chat_history, File "/mnt/f/chatGPT/langchain-ChatGLM/models/fastchat_openai_llm.py", line 109, in generatorAnswer completion = openai.ChatCompletion.create( File

麻烦看一下！

Jun 07 '23 08:06 alleniver

Hello, I found the bug which comes from the part of FastChat. The problem comes from the functionget_gen_params in fastchat.serve.openai_api_server.py. In this function, it will try to append your message with a default message, which is already too long (see fastchat.conversation.py line 200 as an example). So even if you try to use 'hello', there is this bug.

Sorry i am writing it in English because that there is no chinese keyboard in this system...

Jun 13 '23 12:06 PinganYANG

同样问题老哥双卡能加载vicuna13吗？最终问题如何解决？

Jun 15 '23 16:06 jiyif11

同样的问题，已经解决了吗？

Jul 11 '23 09:07 hai4john

同样问题老哥双卡能加载vicuna13吗？最终问题如何解决？

我这边单卡能跑，双卡显然也能跑

Jul 11 '23 09:07 hai4john

A lazy way to solve this is to add a line in fastchat.serve.openai_api_server.py line 233 with conv["messages"] = [] after conv = await get_conv(model_name)

Jul 11 '23 09:07 PinganYANG

A lazy way to solve this is to add a line in fastchat.serve.openai_api_server.py line 233 with conv["messages"] = [] after conv = await get_conv(model_name)

好像不能解决该问题

Jul 11 '23 11:07 hai4john

openai.error.APIError: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2228 tokens (1716 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400)

fastchat端显示一个错误的请求 INFO: 127.0.0.1:33084 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

Jul 11 '23 11:07 hai4john

另外，如果使用的本地知识库是程序自带的 samples，是可以work的。

Jul 11 '23 11:07 hai4john

那我也不太清楚了。我没直接用这个repo，只是用了他request到fastchat那部分的代码。你可以看一下他最后一步输入的prompt到底是什么样子的，加没加奇怪的conversation。否则可能是你的embedding chunksize太大了。2048的token还是挺小的

Jul 11 '23 11:07 PinganYANG

当我开启fastchat的vicuna-13b的api服务，然后config那里配置好(api本地测试过可以返回结果)，然后知识库加载好之后(知识库大概有1000多个文档，用chatGLM可以正常推理)，进行问答时出现token超过限制，就问了一句hello；

错误号如下：openai.error.APIError: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2359 tokens (1847 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400)

错误的文件定位： File "/mnt/f/chatGPT/langchain-ChatGLM/chains/local_doc_qa.py", line 303, in get_knowledge_based_answer for answer_result in self.llm.generatorAnswer(prompt=prompt, history=chat_history, File "/mnt/f/chatGPT/langchain-ChatGLM/models/fastchat_openai_llm.py", line 109, in generatorAnswer completion = openai.ChatCompletion.create( File

麻烦看一下！

我也遇到了同样的问题，请问楼主有解决吗

Jul 19 '23 02:07 tanglu86

当我开启fastchat的vicuna-13b的api服务，然后config那里配置好(api本地测试过可以返回结果)，然后知识库加载好之后(知识库大概有1000多个文档，用chatGLM可以正常推理)，进行问答时出现token超过限制，就问了一句hello；错误号如下：openai.error.APIError: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2359 tokens (1847 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400) 错误的文件定位： File "/mnt/f/chatGPT/langchain-ChatGLM/chains/local_doc_qa.py", line 303, in get_knowledge_based_answer for answer_result in self.llm.generatorAnswer(prompt=prompt, history=chat_history, File "/mnt/f/chatGPT/langchain-ChatGLM/models/fastchat_openai_llm.py", line 109, in generatorAnswer completion = openai.ChatCompletion.create( File 麻烦看一下！

我也遇到了同样的问题，请问楼主有解决吗

降低VECTOR_SEARCH_TOP_K值

Jul 19 '23 14:07 jiyif11

当我开启fastchat的vicuna-13b的api服务，然后config那里配置好(api本地测试过可以返回结果)，然后知识库加载好之后(知识库大概有1000多个文档，用chatGLM可以正常推理)，进行问答时出现token超过限制，就问了一句hello；错误号如下：openai.error.APIError: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2359 tokens (1847 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400) 错误的文件定位： File "/mnt/f/chatGPT/langchain-ChatGLM/chains/local_doc_qa.py", line 303, in get_knowledge_based_answer for answer_result in self.llm.generatorAnswer(prompt=prompt, history=chat_history, File "/mnt/f/chatGPT/langchain-ChatGLM/models/fastchat_openai_llm.py", line 109, in generatorAnswer completion = openai.ChatCompletion.create( File 麻烦看一下！

我也遇到了同样的问题，请问楼主有解决吗

降低VECTOR_SEARCH_TOP_K值

经过验证，这个办法可以解决此问题 model_config.py 中VECTOR_SEARCH_TOP_K值默认为5，验证时修改为 1，错误没有发生，但具体可以增大到多少，没有尝试。

Jul 24 '23 02:07 hai4john

Langchain-Chatchat Langchain-Chatchat copied to clipboard

[BUG] 用fastchat加载vicuna-13b模型进行知识库的问答有token的限制错误

Langchain-Chatchat
Langchain-Chatchat copied to clipboard