TensorRT-LLM
TensorRT-LLM copied to clipboard
convert qwen2.5-VL fail
System Info
x86_64,
128G
RTX3090 24G
TensorRT-LLM 0.19.0
cuda 12.8.93
host system ubuntu 20.04
host GPU driver 550.144.03
TensorRT 10.9.0.34
cuBLAS 12.8.4.1
CONTAINER ID IMAGE
ec1bbab4b4aa tensorrt_llm/release:latest
Who can help?
@ncomly-nvidia
Information
- [x] The official example scripts
- [ ] My own modified scripts
Tasks
- [x] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
- make -C docker release_run
- root@dzy-3660-release:/app/tensorrt_llm/examples/qwenvl# python3 vit_onnx_trt.py --pretrained_model_path /code/tensorrt_llm/Qwen2.5-VL-3B-Instruct
Expected behavior
run qwen2.5-VL-3B-Instruct and qwen2.5-VL-72B-Instruct with TensorRT-LLM
actual behavior
2025-05-16 01:19:01,052 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[TensorRT-LLM] TensorRT-LLM version: 0.19.0
Start converting ONNX model!
Traceback (most recent call last):
File "/app/tensorrt_llm/examples/qwenvl/vit_onnx_trt.py", line 192, in
additional notes
shoud use Qwen2_5_VLForConditionalGeneration to load model
I'm guessing you're using the CPP workflow? It's supported in our Pytorch workflow. Please give it a shot:
- https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/pytorch#supported-models
- https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/models/modeling_qwen2vl.py#L453
could you please provide me with the model conversion script of qwen 2.5 VL? when using https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/models/core/qwenvl/vit_onnx_trt.py , errors will occur when exporting onnx graphs and trt graphs.
i also used https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/models/core/multimodal/build_multimodal_engine.py to convert model, but it does not support --model_type qwen2_5_vl
Hi, looks like you're still following the classical TRT codepath. I don't see a mention of qwen2_5_vl in multimodal_model_builder.py. So, I don't think it's supported there.
Can you please try out the pytorch codepath following the links I shared in my last comment?
Hi, looks like you're still following the classical TRT codepath. I don't see a mention of
qwen2_5_vlin multimodal_model_builder.py. So, I don't think it's supported there.Can you please try out the pytorch codepath following the links I shared in my last comment?
Hi, Thank you for your suggestion. I can start the model directly by
trtllm-serve model_path \
--backend pytorch
Hi, looks like you're still following the classical TRT codepath. I don't see a mention of
qwen2_5_vlin multimodal_model_builder.py. So, I don't think it's supported there. Can you please try out the pytorch codepath following the links I shared in my last comment?Hi, Thank you for your suggestion. I can start the model directly by
trtllm-serve model_path \ --backend pytorch
Hi, Do you know how to send a request to the http service? I tried this method, but it didn't work.
response = client.chat.completions.create(
model="Qwen2.5-VL-7B-Instruct",
messages=[{
"role": "user",
"content": [
{
"type": "image",
"image": "file:///demo.jpeg",
},
{"type": "text", "text": "Describe this image."},
]
}],
max_tokens=20,
)
Hi, looks like you're still following the classical TRT codepath. I don't see a mention of
qwen2_5_vlin multimodal_model_builder.py. So, I don't think it's supported there. Can you please try out the pytorch codepath following the links I shared in my last comment?Hi, Thank you for your suggestion. I can start the model directly by
trtllm-serve model_path \ --backend pytorchHi, Do you know how to send a request to the http service? I tried this method, but it didn't work.
response = client.chat.completions.create( model="Qwen2.5-VL-7B-Instruct", messages=[{ "role": "user", "content": [ { "type": "image", "image": "file:///demo.jpeg", }, {"type": "text", "text": "Describe this image."}, ] }], max_tokens=20, )
I input the picture through the image_url. You can convert the local picture to base64 format for use.
{
"model": "Qwen2.5-VL-7B-Instruct",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "..."}}
]
}
]
}
[ { "role": "user", "content": [ {"type": "text", "text": "What is in this image?"}, {"type": "image_url", "image_url": {"url": "..."}} ] } ]
yeah, I also tried base64 format, but it reported openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'image_url is not supported', 'type': 'BadRequestError', 'param': None, 'code': 400}
[ { "role": "user", "content": [ {"type": "text", "text": "What is in this image?"}, {"type": "image_url", "image_url": {"url": "..."}} ] } ]
yeah, I also tried base64 format, but it reported
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'image_url is not supported', 'type': 'BadRequestError', 'param': None, 'code': 400}
me too...
[ { "role": "user", "content": [ {"type": "text", "text": "What is in this image?"}, {"type": "image_url", "image_url": {"url": "..."}} ] } ]
yeah, I also tried base64 format, but it reported
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'image_url is not supported', 'type': 'BadRequestError', 'param': None, 'code': 400}
hi, It's a problem with my expression. when using pytorch as backend, I get the same error as you. but it works well when offline inference. ps: conversion needs to be carried out first before offline inference pps : there will be problems when deploying serve using the conversion model