Qwen-Agent workstation的editor提交数据到错误的URL，造成http 404错误

在ollama中跑qwen2/Qwen2-beta-4B-Chat 模型，启动browserqwen后，workstation页面的editor会提交数据到错误的URL，ollama的API文档里提到completion的URL对应是 "/api/chat" （参见：https://github.com/ollama/ollama/blob/main/docs/api.md） ,但是点击“continue”按钮时，可以看到数据被提交到了"/api/chat/completions",从而造成http 404错误，browserqwen端报错信息 Traceback (most recent call last): File "/home/kevin/Qwen-Agent/qwen_server/workstation_server.py", line 387, in generate for chunk in output_beautify.convert_to_full_str_stream(response): File "/home/kevin/Qwen-Agent/qwen_server/output_beautify.py", line 85, in convert_to_full_str_stream for message_list in message_list_stream: File "/home/kevin/Qwen-Agent/qwen_agent/agent.py", line 65, in run for rsp in self._run(messages=new_messages, **kwargs): File "/home/kevin/Qwen-Agent/qwen_agent/agents/article_agent.py", line 37, in _run for trunk in res: File "/home/kevin/Qwen-Agent/qwen_agent/agent.py", line 65, in run for rsp in self._run(messages=new_messages, **kwargs): File "/home/kevin/Qwen-Agent/qwen_agent/prompts/write_from_scratch.py", line 52, in _run for trunk in res_sum: File "/home/kevin/Qwen-Agent/qwen_agent/agent.py", line 65, in run for rsp in self._run(messages=new_messages, **kwargs): File "/home/kevin/Qwen-Agent/qwen_agent/llm/base.py", line 423, in _convert_messages_iterator_to_target_type for messages in messages_iter: File "/home/kevin/Qwen-Agent/qwen_agent/llm/oai.py", line 60, in _chat_stream response = self._chat_complete_create(model=self.model, File "/home/kevin/Qwen-Agent/qwen_agent/llm/oai.py", line 49, in _chat_complete_create return client.chat.completions.create(*args, **kwargs) File "/home/kevin/.local/lib/python3.8/site-packages/openai/_utils/_utils.py", line 275, in wrapper return func(*args, **kwargs) File "/home/kevin/.local/lib/python3.8/site-packages/openai/resources/chat/completions.py", line 663, in create return self._post( File "/home/kevin/.local/lib/python3.8/site-packages/openai/_base_client.py", line 1200, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) File "/home/kevin/.local/lib/python3.8/site-packages/openai/_base_client.py", line 889, in request return self._request( File "/home/kevin/.local/lib/python3.8/site-packages/openai/_base_client.py", line 980, in _request raise self._make_status_error_from_response(err.response) from None openai.NotFoundError: 404 page not found

Ollama服务器端的错误信息： llama_new_context_with_model: graph splits (measure): 1 time=2024-03-01T18:23:49.073+08:00 level=INFO source=dyn_ext_server.go:161 msg="Starting llama main loop" [GIN] 2024/03/01 - 18:23:49 | 200 | 5.4693876s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/03/01 - 18:25:59 | 404 | 2.7µs | 127.0.0.1 | POST "/api/chat/completions"

如何修复？能否改为可配置completion URL呢？

Mar 01 '24 10:03 LaoK263

根据 https://github.com/ollama/ollama/blob/main/docs/openai.md 这个文档，api_base（也叫 base_url）应该填类似 http://localhost:11434/v1/ 的值 —— 请注意有 /v1/ 这个后缀，这个后缀可能在您给的那个 api.md 里并没有提到。

注：我正在安装 ollama 进行尝试，网速有点慢。。

Mar 01 '24 13:03 JianxinMa

我的ollama实例启动log显示base url没有/v1/,我用以下命令启动browserqwen还是报404错误： python3 run_server.py --llm qwen2/Qwen2-beta-4B-Chat --model_server http://127.0.0.1:11434/v1/api

ollama端的日志： llama_new_context_with_model: graph splits (measure): 1 time=2024-03-04T11:34:04.291+08:00 level=INFO source=dyn_ext_server.go:161 msg="Starting llama main loop" [GIN] 2024/03/04 - 11:34:04 | 200 | 4.3672342s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/03/04 - 11:35:33 | 200 | 3.4914991s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/03/04 - 11:38:33 | 404 | 2.9µs | 127.0.0.1 | POST "/v1/api/chat/completions"

所以我觉得应该参考他的endpoint API文档：https://github.com/ollama/ollama/blob/main/docs/api.md 。我也是新手，不知道是否可以在ollama在把qwen启动为兼容openai的API模式。

Mar 04 '24 03:03 LaoK263

我的ollama实例启动log显示base url没有/v1/,我用以下命令启动browserqwen还是报404错误： python3 run_server.py --llm qwen2/Qwen2-beta-4B-Chat --model_server http://127.0.0.1:11434/v1/api

ollama端的日志： llama_new_context_with_model: graph splits (measure): 1 time=2024-03-04T11:34:04.291+08:00 level=INFO source=dyn_ext_server.go:161 msg="Starting llama main loop" [GIN] 2024/03/04 - 11:34:04 | 200 | 4.3672342s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/03/04 - 11:35:33 | 200 | 3.4914991s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/03/04 - 11:38:33 | 404 | 2.9µs | 127.0.0.1 | POST "/v1/api/chat/completions"

所以我觉得应该参考他的endpoint API文档：https://github.com/ollama/ollama/blob/main/docs/api.md 。我也是新手，不知道是否可以在ollama在把qwen启动为兼容openai的API模式。

试试 --model_server http://127.0.0.1:11434/v1 ？注意以 v1 结尾，不是以 v1/api 结尾

Mar 04 '24 04:03 JianxinMa

用 --model_server http://127.0.0.1:11434/v1 参数还是同样的404错误，启动命令为： python3 run_server.py --llm qwen2/Qwen2-beta-4B-Chat --model_server http://127.0.0.1:11434/v1

在ollama server端看到的错误： llama_new_context_with_model: graph splits (measure): 1 time=2024-03-06T15:13:57.787+08:00 level=INFO source=dyn_ext_server.go:161 msg="Starting llama main loop" [GIN] 2024/03/06 - 15:13:57 | 200 | 4.8649114s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/03/06 - 15:14:23 | 200 | 10.8918156s | 127.0.0.1 | POST "/api/chat" [GIN] 2024/03/06 - 15:15:38 | 404 | 303.5µs | 127.0.0.1 | POST "/v1/chat/completions"

在workstation的错误信息： Traceback (most recent call last): File "/home/kevin/Qwen-Agent/qwen_server/workstation_server.py", line 387, in generate for chunk in output_beautify.convert_to_full_str_stream(response): File "/home/kevin/Qwen-Agent/qwen_server/output_beautify.py", line 85, in convert_to_full_str_stream for message_list in message_list_stream: File "/home/kevin/Qwen-Agent/qwen_agent/agent.py", line 65, in run for rsp in self._run(messages=new_messages, **kwargs): File "/home/kevin/Qwen-Agent/qwen_agent/agents/article_agent.py", line 37, in _run for trunk in res: File "/home/kevin/Qwen-Agent/qwen_agent/agent.py", line 65, in run for rsp in self._run(messages=new_messages, **kwargs): File "/home/kevin/Qwen-Agent/qwen_agent/prompts/write_from_scratch.py", line 52, in _run for trunk in res_sum: File "/home/kevin/Qwen-Agent/qwen_agent/agent.py", line 65, in run for rsp in self._run(messages=new_messages, **kwargs): File "/home/kevin/Qwen-Agent/qwen_agent/llm/base.py", line 423, in _convert_messages_iterator_to_target_type for messages in messages_iter: File "/home/kevin/Qwen-Agent/qwen_agent/llm/oai.py", line 60, in _chat_stream response = self._chat_complete_create(model=self.model, File "/home/kevin/Qwen-Agent/qwen_agent/llm/oai.py", line 49, in _chat_complete_create return client.chat.completions.create(*args, **kwargs) File "/home/kevin/.local/lib/python3.8/site-packages/openai/_utils/_utils.py", line 275, in wrapper return func(*args, **kwargs) File "/home/kevin/.local/lib/python3.8/site-packages/openai/resources/chat/completions.py", line 663, in create return self._post( File "/home/kevin/.local/lib/python3.8/site-packages/openai/_base_client.py", line 1200, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) File "/home/kevin/.local/lib/python3.8/site-packages/openai/_base_client.py", line 889, in request return self._request( File "/home/kevin/.local/lib/python3.8/site-packages/openai/_base_client.py", line 980, in _request raise self._make_status_error_from_response(err.response) from None openai.NotFoundError: Error code: 404 - {'error': {'message': "model 'qwen2/Qwen2-beta-4B-Chat' not found, try pulling it first", 'type': 'api_error', 'param': None, 'code': None}}

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/kevin/.local/lib/python3.8/site-packages/gradio/queueing.py", line 407, in call_prediction output = await route_utils.call_process_api( File "/home/kevin/.local/lib/python3.8/site-packages/gradio/route_utils.py", line 226, in call_process_api output = await app.get_blocks().process_api( File "/home/kevin/.local/lib/python3.8/site-packages/gradio/blocks.py", line 1550, in process_api result = await self.call_function( File "/home/kevin/.local/lib/python3.8/site-packages/gradio/blocks.py", line 1199, in call_function prediction = await utils.async_iteration(iterator) File "/home/kevin/.local/lib/python3.8/site-packages/gradio/utils.py", line 519, in async_iteration return await iterator.anext() File "/home/kevin/.local/lib/python3.8/site-packages/gradio/utils.py", line 512, in anext return await anyio.to_thread.run_sync( File "/home/kevin/.local/lib/python3.8/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/home/kevin/.local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread return await future File "/home/kevin/.local/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 851, in run result = context.run(func, *args) File "/home/kevin/.local/lib/python3.8/site-packages/gradio/utils.py", line 495, in run_sync_iterator_async return next(iterator) File "/home/kevin/.local/lib/python3.8/site-packages/gradio/utils.py", line 649, in gen_wrapper yield from f(*args, **kwargs) File "/home/kevin/Qwen-Agent/qwen_server/workstation_server.py", line 392, in generate raise ValueError(ex) ValueError: Error code: 404 - {'error': {'message': "model 'qwen2/Qwen2-beta-4B-Chat' not found, try pulling it first", 'type': 'api_error', 'param': None, 'code': None}}

Mar 06 '24 07:03 LaoK263

报错信息里有：

{'error': {'message': "model 'qwen2/Qwen2-beta-4B-Chat' not found, try pulling it first", 'type': 'api_error', 'param': None, 'code': None}}

从这个信息看，是访问了正确地址了。但是ollama服务那边还没配置好qwen模型。

可以先参考 https://qwen.readthedocs.io/en/latest/run_locally/ollama.html 确保用ollama运行qwen本身能跑通。

然后 --llm qwen2/Qwen2-beta-4B-Chat 这个我不确定是否是正确的，也许可以试试 --llm qwen4b 或 --lm qwen:4b （不确定是否正确）。

P.S.: 公司网络不好，我至今还没安装成功ollama。。。

Mar 06 '24 07:03 JianxinMa

您好，我刚刚终于安装上了ollama，并用类似如下的命令跑通了：

python run_server.py --llm qwen:0.5b --model_server http://127.0.0.1:11434/v1

Mar 06 '24 07:03 JianxinMa

我这里还是跑不通，qwen模型是可以在ollama启动，并且正常使用的，如下是qwen模型启动后的对话： qwen 在运行ollama run qwen时，在ollama server端看到的启动信息如下： ollma

结合ollama server端的404报错，是不是browserqwen只要遇到了http 404错误，标准的错误输出都是moduel没有找到？我试着把--llm参数设置为qwen2, qwen2:4B等各种值，都是同样的404错误，说模块没有找到： ValueError: Error code: 404 - {'error': {'message': "model 'qwen2:4B' not found, try pulling it first", 'type': 'api_error', 'param': None, 'code': None}}

ValueError: Error code: 404 - {'error': {'message': "model 'qwen2' not found, try pulling it first", 'type': 'api_error', 'param': None, 'code': None}}

ValueError: Error code: 404 - {'error': {'message': "model 'qwen:4b' not found, try pulling it first", 'type': 'api_error', 'param': None, 'code': None}} ValueError: Error code: 404 - {'error': {'message': "model 'qwen:0.5b' not found, try pulling it first", 'type': 'api_error', 'param': None, 'code': None}}

Mar 06 '24 08:03 LaoK263

我这里还是跑不通，qwen模型是可以在ollama启动，并且正常使用的，如下是qwen模型启动后的对话：在运行ollama run qwen时，在ollama server端看到的启动信息如下：

结合ollama server端的404报错，是不是browserqwen只要遇到了http 404错误，标准的错误输出都是moduel没有找到？我试着把--llm参数设置为qwen2, qwen2:4B等各种值，都是同样的404错误，说模块没有找到： ValueError: Error code: 404 - {'error': {'message': "model 'qwen2:4B' not found, try pulling it first", 'type': 'api_error', 'param': None, 'code': None}}

ValueError: Error code: 404 - {'error': {'message': "model 'qwen2' not found, try pulling it first", 'type': 'api_error', 'param': None, 'code': None}}

ValueError: Error code: 404 - {'error': {'message': "model 'qwen:4b' not found, try pulling it first", 'type': 'api_error', 'param': None, 'code': None}} ValueError: Error code: 404 - {'error': {'message': "model 'qwen:0.5b' not found, try pulling it first", 'type': 'api_error', 'param': None, 'code': None}}

我是先分别执行了：

ollama serve

ollama run qwen:0.5b
# 然后 /bye 退出

python run_server.py --llm qwen:0.5b --model_server http://127.0.0.1:11434/v1

之后就行了

Mar 06 '24 08:03 JianxinMa

我是先分别执行了：

ollama serve

ollama run qwen:0.5b
# 然后 /bye 退出

python run_server.py --llm qwen:0.5b --model_server http://127.0.0.1:11434/v1

之后就行了

是的，这是启动ollama运行大模型的指令，我和你的不同之处在于使用了4b模型，也就是ollama run qwen时会默认下载运行4b模型，这也是为什么运行browserqwen的质量中--llm参数和你不同。我想知道，在哪个源代码文件可以修改提交给ollama的completion请求URL？希望可以把request提交到"/v1/chat/"，而不是“"/v1/chat/completions"。这个可能是问题的关键。

Mar 07 '24 01:03 LaoK263

我是先分别执行了：
ollama serve
ollama run qwen:0.5b
# 然后 /bye 退出
python run_server.py --llm qwen:0.5b --model_server http://127.0.0.1:11434/v1
之后就行了
是的，这是启动ollama运行大模型的指令，我和你的不同之处在于使用了4b模型，也就是ollama run qwen时会默认下载运行4b模型，这也是为什么运行browserqwen的质量中--llm参数和你不同。我想知道，在哪个源代码文件可以修改提交给ollama的completion请求URL？希望可以把request提交到"/v1/chat/"，而不是“"/v1/chat/completions"。这个可能是问题的关键。

这个requests的提交路径并不是我写的，而是我调用openai的sdk，openai的sdk提交的。参见 https://github.com/QwenLM/Qwen-Agent/blob/main/qwen_agent/llm/oai.py

注：我的用openai sdk版本号为1.13.3

Mar 07 '24 01:03 JianxinMa

建议先测试下 https://github.com/ollama/ollama/blob/main/docs/openai.md 文档给出的这个例子：

from openai import OpenAI
client = OpenAI(
    base_url='http://localhost:11434/v1/',
    # required but ignored
    api_key='ollama',
)
chat_completion = client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'Say this is a test',
        }
    ],
    model='llama2',  # 改成 qwen:4b
)

确保这个能跑通。

Mar 07 '24 02:03 JianxinMa

是不是说qwen必须运行在兼容OpenAI API的模式下才行？

Mar 11 '24 08:03 LaoK263

是不是说qwen必须运行在兼容OpenAI API的模式下才行？

是的。本地部署模型时，部署方式需要提供openai兼容的api，参见：https://github.com/QwenLM/Qwen-Agent/issues/95#issuecomment-1987785011 。不想部署自己的模型的话，可以使用dashscope提供的云服务。

Mar 11 '24 09:03 JianxinMa

Qwen-Agent Qwen-Agent copied to clipboard

workstation的editor提交数据到错误的URL，造成http 404错误

Qwen-Agent
Qwen-Agent copied to clipboard