John
John
同样的问题,已经解决了吗?
> 同样问题 老哥双卡能加载vicuna13吗?最终问题如何解决? 我这边 单卡 能跑,双卡显然也能跑
> A lazy way to solve this is to add a line in fastchat.serve.openai_api_server.py line 233 with `conv["messages"] = []` after `conv = await get_conv(model_name)` 好像不能解决该问题
openai.error.APIError: Invalid response object from API: '{"object":"error","message":"This model\'s maximum context length is 2048 tokens. However, you requested 2228 tokens (1716 in the messages, 512 in the completion). Please reduce the...
另外,如果使用的本地知识库是 程序自带的 samples,是可以work的。
> > > 当我开启fastchat的vicuna-13b的api服务,然后config那里配置好(api本地测试过可以返回结果),然后知识库加载好之后(知识库大概有1000多个文档,用chatGLM可以正常推理),进行问答时出现token超过限制,就问了一句hello; > > > 错误号如下:openai.error.APIError: Invalid response object from API: '{"object":"error","message":"This model's maximum context length is 2048 tokens. However, you requested 2359 tokens (1847 in the messages,...
Note: This problem will occur in the K8S container, but it will not appear when running directly on the physical machine