Zongru Wang
Zongru Wang
Same, I tried --tool-call-parser deepseek_v3 --chat-template examples/tool_chat_template_deepseekr1.jinja, and got Error code: 400 - {'object': 'error', 'message': 'DeepSeek-V3 Tool parser could not locate tool call start/end tokens in the tokenizer! None',...
> I succeeded with > > * patch [[Bugfix]: Fix the incompatibility issue with tool_choice 'required' when Thinking is enabled #19075](https://github.com/vllm-project/vllm/pull/19075) > * `--enable-auto-tool-choice --tool-call-parser hermes --chat-template tool_chat_template_deepseekr1.jinja`, template from...
> [@alllexx88](https://github.com/alllexx88) [@Zongru-Wang](https://github.com/Zongru-Wang) I updated my comment with setup & test script, you can give a try I use `vllm serve path/to/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B --tensor-parallel-size 4 --host 0.0.0.0 --port 10001 --api-key xxxx...
> > {"object":"error","message":"1 validation error for list[function-wrap[**log_extra_fields**()]]\n Invalid JSON: expected value at line 1 column 1 [type=json_invalid, input_value='\n好的,用户...": {"city": "上海"}}]', input_type=str]\n For further information visit [https://errors.pydantic.dev/2.10/v/json_invalid","type":"BadRequestError","param":null,"code":400}](https://errors.pydantic.dev/2.10/v/json_invalid%22,%22type%22:%22BadRequestError%22,%22param%22:null,%22code%22:400%7D) > > I’ve already...
> [@ZeppLu]https://github.com/ZeppLu) @chaunceyjiang Hello, I can reproduce what you're seeing with openai compatible API server and a langchain client. The key difference is that you're setting `"tool_choice": "required"`. If I...
> [@Zongru-Wang](https://github.com/Zongru-Wang) The error you're facing with "required" tools is fixed in an already merged commit [#19075](https://github.com/vllm-project/vllm/pull/19075) > > You can install a nightly wheel that should have it fixed,...
> [@Zongru-Wang](https://github.com/Zongru-Wang) in theory, changing `servering_chat.py` is the fix. The updated wheel just has it fixed already. It's just more reliable, instead of manually patching installed vllm. If it still...
> [@Zongru-Wang](https://github.com/Zongru-Wang) can you share a full minimal working example that reproduces your issue so I can try it myself? > > Also, I still don't consider the issue fully...
> [@Zongru-Wang](https://github.com/Zongru-Wang) you forgot a return statement in `call_openai_api_stream()`, which made `tell_joke` fail. After adding it, all works fine with `"tool_chocie": "required"` (just reasoning parsing in `tell_joke` fails sometimes, printing...
> 我也复现了,速度也是非常的慢,我是2张A100显卡。模型跟你用的一样,问问题,半天出来一个字。还没解决,持续关注该问题。如果解决了麻烦大佬给下解决方案。 有几个点,一个是芯片型号,核心数设置物理核心数, 第二个是服务器的内存带宽多大(内存条插满才是最大,我现在的配置计算下来是384GB 左右的内存带宽) 其他我也不清楚了 使用的是v 0.2.1的, v 0.3的正在下载BF16 的满血模型,环境安装完成了到时候我再试试 export USE_NUMA=1 (我这里有2 numa nodes 我不清楚这里应该设置什么,当我export USE_NUMA=2 的时候 cpu才是满负载) ktransformers --model_path xxx/DeepSeek-R1-GGUF --gguf_path xxx/unsloth/DeepSeek-R1-GGUF/DeepSeek-R1-Q4_K_M --cpu_infer 97 --max_new_tokens 4000 GPU:...