GLM-4
GLM-4 copied to clipboard
GLM-Z1使用vllm批量推理输出错乱
System Info / 系統信息
vllm==0.8.5.dev92+g5c9121203.precompiled transformers==4.51.3
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
- [ ] The official example scripts / 官方的示例脚本
- [ ] My own modified scripts / 我自己修改的脚本和任务
Reproduction / 复现过程
import os
from vllm import LLM, SamplingParams
engine_args = {
"model": "/models/THUDM/GLM-Z1-32B-0414",
"trust_remote_code": True,
"max_num_seqs": None,
"max_model_len": 11264,
"tensor_parallel_size": 2,
"pipeline_parallel_size": 1,
"disable_log_stats": True,
"enable_lora": False,
}
llm = LLM(**engine_args)
sampling_params = SamplingParams(
temperature=0.6,
top_p=0.95,
top_k=40,
stop_token_ids=[tokenizer.eos_token_id],
max_tokens=1024,
seed=2025,
)
prompts = ["", "", ...] # 1000条数据
pred_results = llm.generate(prompts, sampling_params)
# 批量预测结果异常
pred_results[11] # 抽了第11条
# 单条预测正常
llm.generate(prompts[11], sampling_params)
Expected behavior / 期待表现
同样的方式和数据,使用QwQ-32B、GLM-4-32B-0414进行批量推理都不会这样
这个应该是vllm的问题,你可以在vllm仓库询问,我们会跟踪这个问题并在vllm仓库提交pr解决