MiniCPM-V
MiniCPM-V copied to clipboard
VLLM 用AsyncLLMEngine推理结果报错
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
- [X] 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
模型MiniCPM-Llama3-V2.5使用如下的脚本推理正常
from PIL import Image
from vllm import LLM, SamplingParams,EngineArgs,LLMEngine
MODEL_NAME = "/model"
# Also available for previous models
# MODEL_NAME = "openbmb/MiniCPM-Llama3-V-2_5"
# MODEL_NAME = "HwwwH/MiniCPM-V-2"
if __name__ == "__main__":
image = Image.open("/app/fruit_stand.jpg").convert("RGB")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
llm = LLM(
model=MODEL_NAME,
trust_remote_code=True,
gpu_memory_utilization=1,
max_model_len=2048,
tensor_parallel_size=2
)
messages = [{
"role":
"user",
"content":
# Number of images
"(<image>./</image>)" + \
"\n这是一张什么图片?"
}]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# Single Inference
inputs = {
"prompt": prompt,
"multi_modal_data": {
"image": image
# Multi images, the number of images should be equal to that of `(<image>./</image>)`
# "image": [image, image]
},
}
# Batch Inference
# inputs = [{
# "prompt": prompt,
# "multi_modal_data": {
# "image": image
# },
# } for _ in 2]
# 2.6
#stop_tokens = ['<|im_end|>', '<|endoftext|>']
#stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]
# 2.0
# stop_token_ids = [tokenizer.eos_id]
# 2.5
stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]
sampling_params = SamplingParams(
stop_token_ids=stop_token_ids,
use_beam_search=True,
temperature=0,
best_of=3,
max_tokens=1024
)
outputs = llm.generate(inputs, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)```
但是当不使用LLM使用AsyncLLMEngine时,推理结果不正确
### 期望行为 | Expected Behavior
_No response_
### 复现方法 | Steps To Reproduce
_No response_
### 运行环境 | Environment
```Markdown
- OS:centos
- Python:3.10
- Transformers:4.44.0
- PyTorch:2.4.0
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):
- vllm:0.5.4
备注 | Anything else?
No response