Langchain-Chatchat icon indicating copy to clipboard operation
Langchain-Chatchat copied to clipboard

[BUG] 用fastchat加载vicuna-13b模型进行知识库的问答有token的限制错误

Open alleniver opened this issue 1 year ago • 2 comments

当我开启fastchat的vicuna-13b的api服务,然后config那里配置好(api本地测试过可以返回结果),然后知识库加载好之后(知识库大概有1000多个文档,用chatGLM可以正常推理),进行问答时出现token超过限制,就问了一句hello;

错误号如下:openai.error.APIError: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2359 tokens (1847 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400)

错误的文件定位: File "/mnt/f/chatGPT/langchain-ChatGLM/chains/local_doc_qa.py", line 303, in get_knowledge_based_answer for answer_result in self.llm.generatorAnswer(prompt=prompt, history=chat_history, File "/mnt/f/chatGPT/langchain-ChatGLM/models/fastchat_openai_llm.py", line 109, in generatorAnswer completion = openai.ChatCompletion.create( File

麻烦看一下!

alleniver avatar Jun 07 '23 08:06 alleniver

Hello, I found the bug which comes from the part of FastChat. The problem comes from the functionget_gen_params in fastchat.serve.openai_api_server.py. In this function, it will try to append your message with a default message, which is already too long (see fastchat.conversation.py line 200 as an example). So even if you try to use 'hello', there is this bug.

Sorry i am writing it in English because that there is no chinese keyboard in this system...

PinganYANG avatar Jun 13 '23 12:06 PinganYANG

同样问题 老哥双卡能加载vicuna13吗?最终问题如何解决?

jiyif11 avatar Jun 15 '23 16:06 jiyif11

同样的问题,已经解决了吗?

hai4john avatar Jul 11 '23 09:07 hai4john

同样问题 老哥双卡能加载vicuna13吗?最终问题如何解决?

我这边 单卡 能跑,双卡显然也能跑

hai4john avatar Jul 11 '23 09:07 hai4john

A lazy way to solve this is to add a line in fastchat.serve.openai_api_server.py line 233 with conv["messages"] = [] after conv = await get_conv(model_name)

PinganYANG avatar Jul 11 '23 09:07 PinganYANG

A lazy way to solve this is to add a line in fastchat.serve.openai_api_server.py line 233 with conv["messages"] = [] after conv = await get_conv(model_name)

好像不能解决该问题

hai4john avatar Jul 11 '23 11:07 hai4john

openai.error.APIError: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2228 tokens (1716 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400)

fastchat端显示一个错误的请求 INFO: 127.0.0.1:33084 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

hai4john avatar Jul 11 '23 11:07 hai4john

另外,如果使用的本地知识库是 程序自带的 samples,是可以work的。

hai4john avatar Jul 11 '23 11:07 hai4john

那我也不太清楚了。我没直接用这个repo,只是用了他request到fastchat那部分的代码。 你可以看一下他最后一步输入的prompt到底是什么样子的,加没加奇怪的conversation。否则可能是你的embedding chunksize太大了。2048的token还是挺小的

PinganYANG avatar Jul 11 '23 11:07 PinganYANG

当我开启fastchat的vicuna-13b的api服务,然后config那里配置好(api本地测试过可以返回结果),然后知识库加载好之后(知识库大概有1000多个文档,用chatGLM可以正常推理),进行问答时出现token超过限制,就问了一句hello;

错误号如下:openai.error.APIError: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2359 tokens (1847 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400)

错误的文件定位: File "/mnt/f/chatGPT/langchain-ChatGLM/chains/local_doc_qa.py", line 303, in get_knowledge_based_answer for answer_result in self.llm.generatorAnswer(prompt=prompt, history=chat_history, File "/mnt/f/chatGPT/langchain-ChatGLM/models/fastchat_openai_llm.py", line 109, in generatorAnswer completion = openai.ChatCompletion.create( File

麻烦看一下!

我也遇到了同样的问题,请问楼主有解决吗

tanglu86 avatar Jul 19 '23 02:07 tanglu86

当我开启fastchat的vicuna-13b的api服务,然后config那里配置好(api本地测试过可以返回结果),然后知识库加载好之后(知识库大概有1000多个文档,用chatGLM可以正常推理),进行问答时出现token超过限制,就问了一句hello; 错误号如下:openai.error.APIError: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2359 tokens (1847 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400) 错误的文件定位: File "/mnt/f/chatGPT/langchain-ChatGLM/chains/local_doc_qa.py", line 303, in get_knowledge_based_answer for answer_result in self.llm.generatorAnswer(prompt=prompt, history=chat_history, File "/mnt/f/chatGPT/langchain-ChatGLM/models/fastchat_openai_llm.py", line 109, in generatorAnswer completion = openai.ChatCompletion.create( File 麻烦看一下!

我也遇到了同样的问题,请问楼主有解决吗

降低VECTOR_SEARCH_TOP_K值

jiyif11 avatar Jul 19 '23 14:07 jiyif11

当我开启fastchat的vicuna-13b的api服务,然后config那里配置好(api本地测试过可以返回结果),然后知识库加载好之后(知识库大概有1000多个文档,用chatGLM可以正常推理),进行问答时出现token超过限制,就问了一句hello; 错误号如下:openai.error.APIError: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2359 tokens (1847 in the messages, 512 in the completion). Please reduce the length of the messages or completion.","code":40303}' (HTTP response code was 400) 错误的文件定位: File "/mnt/f/chatGPT/langchain-ChatGLM/chains/local_doc_qa.py", line 303, in get_knowledge_based_answer for answer_result in self.llm.generatorAnswer(prompt=prompt, history=chat_history, File "/mnt/f/chatGPT/langchain-ChatGLM/models/fastchat_openai_llm.py", line 109, in generatorAnswer completion = openai.ChatCompletion.create( File 麻烦看一下!

我也遇到了同样的问题,请问楼主有解决吗

降低VECTOR_SEARCH_TOP_K值

经过验证,这个办法可以解决此问题 model_config.py 中VECTOR_SEARCH_TOP_K值 默认为5,验证时 修改为 1,错误没有发生,但具体可以增大到多少,没有尝试。

hai4john avatar Jul 24 '23 02:07 hai4john