GLM-4 GLM-Z1-32B-0414 vLLM部署 reasoning 有问题

System Info / 系統信息

部署参数如下： python3 -m vllm.entrypoints.openai.api_server --disable-log-requests --host 0.0.0.0 --port 8080 --model THUDM/GLM-Z1-32B-0414 --served-model-name thudm/glm-z1-32b-0414 --max-num-seqs 16 --gpu-memory-utilization 0.92 --max-model-len 32768 --tensor-parallel-size 1 --chat-template-content-format auto --enable-prefix-caching --enable-chunked-prefill --enable-auto-tool-choice --tool-call-parser pythonic

得到的响应 reasoning 标签只有一半，导致结果无法正确识别哪些是思考部分哪些是回答部分

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

[ ] The official example scripts / 官方的示例脚本
[ ] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

部署参数如下： python3 -m vllm.entrypoints.openai.api_server --disable-log-requests --host 0.0.0.0 --port 8080 --model THUDM/GLM-Z1-32B-0414 --served-model-name thudm/glm-z1-32b-0414 --max-num-seqs 16 --gpu-memory-utilization 0.92 --max-model-len 32768 --tensor-parallel-size 1 --chat-template-content-format auto --enable-prefix-caching --enable-chunked-prefill --enable-auto-tool-choice --tool-call-parser pythonic

得到的响应 reasoning 标签只有一半，导致结果无法正确识别哪些是思考部分哪些是回答部分