Zongru Wang comments

Results 10 comments of


                                            Zongru Wang

[Usage]: How to use DeepSeek-R1-0528-Qwen3-8B with function call

Same, I tried --tool-call-parser deepseek_v3 --chat-template examples/tool_chat_template_deepseekr1.jinja, and got Error code: 400 - {'object': 'error', 'message': 'DeepSeek-V3 Tool parser could not locate tool call start/end tokens in the tokenizer! None',...

[Usage]: How to use DeepSeek-R1-0528-Qwen3-8B with function call

> I succeeded with > > * patch [[Bugfix]: Fix the incompatibility issue with tool_choice 'required' when Thinking is enabled #19075](https://github.com/vllm-project/vllm/pull/19075) > * `--enable-auto-tool-choice --tool-call-parser hermes --chat-template tool_chat_template_deepseekr1.jinja`, template from...

[Usage]: How to use DeepSeek-R1-0528-Qwen3-8B with function call

> [@alllexx88](https://github.com/alllexx88) [@Zongru-Wang](https://github.com/Zongru-Wang) I updated my comment with setup & test script, you can give a try I use `vllm serve path/to/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B --tensor-parallel-size 4 --host 0.0.0.0 --port 10001 --api-key xxxx...

[Usage]: How to use DeepSeek-R1-0528-Qwen3-8B with function call

> > {"object":"error","message":"1 validation error for list[function-wrap[**log_extra_fields**()]]\n Invalid JSON: expected value at line 1 column 1 [type=json_invalid, input_value='\n好的，用户...": {"city": "上海"}}]', input_type=str]\n For further information visit [https://errors.pydantic.dev/2.10/v/json_invalid","type":"BadRequestError","param":null,"code":400}](https://errors.pydantic.dev/2.10/v/json_invalid%22,%22type%22:%22BadRequestError%22,%22param%22:null,%22code%22:400%7D) > > I’ve already...

[Usage]: How to use DeepSeek-R1-0528-Qwen3-8B with function call

> [@ZeppLu]https://github.com/ZeppLu) @chaunceyjiang Hello, I can reproduce what you're seeing with openai compatible API server and a langchain client. The key difference is that you're setting `"tool_choice": "required"`. If I...

[Usage]: How to use DeepSeek-R1-0528-Qwen3-8B with function call

> [@Zongru-Wang](https://github.com/Zongru-Wang) The error you're facing with "required" tools is fixed in an already merged commit [#19075](https://github.com/vllm-project/vllm/pull/19075) > > You can install a nightly wheel that should have it fixed,...

[Usage]: How to use DeepSeek-R1-0528-Qwen3-8B with function call

> [@Zongru-Wang](https://github.com/Zongru-Wang) in theory, changing `servering_chat.py` is the fix. The updated wheel just has it fixed already. It's just more reliable, instead of manually patching installed vllm. If it still...

[Usage]: How to use DeepSeek-R1-0528-Qwen3-8B with function call

> [@Zongru-Wang](https://github.com/Zongru-Wang) can you share a full minimal working example that reproduces your issue so I can try it myself? > > Also, I still don't consider the issue fully...

[Usage]: How to use DeepSeek-R1-0528-Qwen3-8B with function call

> [@Zongru-Wang](https://github.com/Zongru-Wang) you forgot a return statement in `call_openai_api_stream()`, which made `tell_joke` fail. After adding it, all works fine with `"tool_chocie": "required"` (just reasoning parsing in `tell_joke` fails sometimes, printing...

复现了，但DeepSeek-R1-Q4_K_M跑起来速度非常慢，只有约1.5token/s，请问是我配置的原因么？

> 我也复现了，速度也是非常的慢，我是2张A100显卡。模型跟你用的一样，问问题，半天出来一个字。还没解决，持续关注该问题。如果解决了麻烦大佬给下解决方案。有几个点，一个是芯片型号，核心数设置物理核心数，第二个是服务器的内存带宽多大(内存条插满才是最大，我现在的配置计算下来是384GB 左右的内存带宽) 其他我也不清楚了使用的是v 0.2.1的， v 0.3的正在下载BF16 的满血模型，环境安装完成了到时候我再试试 export USE_NUMA=1 （我这里有2 numa nodes 我不清楚这里应该设置什么，当我export USE_NUMA=2 的时候 cpu才是满负载） ktransformers --model_path xxx/DeepSeek-R1-GGUF --gguf_path xxx/unsloth/DeepSeek-R1-GGUF/DeepSeek-R1-Q4_K_M --cpu_infer 97 --max_new_tokens 4000 GPU:...