TensorRT-LLM convert qwen2.5-VL fail

System Info

x86_64， 128G RTX3090 24G TensorRT-LLM 0.19.0 cuda 12.8.93 host system ubuntu 20.04 host GPU driver 550.144.03 TensorRT 10.9.0.34 cuBLAS 12.8.4.1 CONTAINER ID IMAGE
ec1bbab4b4aa tensorrt_llm/release:latest

Who can help?

@ncomly-nvidia

Information

[x] The official example scripts
[ ] My own modified scripts

Tasks

[x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

make -C docker release_run
root@dzy-3660-release:/app/tensorrt_llm/examples/qwenvl# python3 vit_onnx_trt.py --pretrained_model_path /code/tensorrt_llm/Qwen2.5-VL-3B-Instruct

Expected behavior

run qwen2.5-VL-3B-Instruct and qwen2.5-VL-72B-Instruct with TensorRT-LLM

actual behavior

2025-05-16 01:19:01,052 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend [TensorRT-LLM] TensorRT-LLM version: 0.19.0 Start converting ONNX model! Traceback (most recent call last): File "/app/tensorrt_llm/examples/qwenvl/vit_onnx_trt.py", line 192, in onnx_trt_obj.export_onnx(args.onnxFile, args.pretrained_model_path, File "/app/tensorrt_llm/examples/qwenvl/vit_onnx_trt.py", line 66, in export_onnx model = AutoModelForCausalLM.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/auto_factory.py", line 574, in from_pretrained raise ValueError( ValueError: Unrecognized configuration class <class 'transformers.models.qwen2_5_vl.configuration_qwen2_5_vl.Qwen2_5_VLConfig'> for this kind of AutoModel: AutoModelForCausalLM. Model type should be one of AriaTextConfig, BambaConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2Config, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, DeepseekV3Config, DiffLlamaConfig, ElectraConfig, Emu3Config, ErnieConfig, FalconConfig, FalconMambaConfig, FuyuConfig, GemmaConfig, Gemma2Config, Gemma3Config, Gemma3TextConfig, GitConfig, GlmConfig, GotOcr2Config, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, GraniteConfig, GraniteMoeConfig, GraniteMoeSharedConfig, HeliumConfig, JambaConfig, JetMoeConfig, LlamaConfig, Llama4Config, Llama4TextConfig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MllamaConfig, MoshiConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, Olmo2Config, OlmoeConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, Phi4MultimodalConfig, PhimoeConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, Qwen3Config, Qwen3MoeConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig, ZambaConfig, Zamba2Config.

additional notes

shoud use Qwen2_5_VLForConditionalGeneration to load model

May 16 '25 11:05 dzy130120

I'm guessing you're using the CPP workflow? It's supported in our Pytorch workflow. Please give it a shot:

https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/pytorch#supported-models
https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/models/modeling_qwen2vl.py#L453

May 16 '25 23:05 brb-nv

could you please provide me with the model conversion script of qwen 2.5 VL? when using https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/models/core/qwenvl/vit_onnx_trt.py , errors will occur when exporting onnx graphs and trt graphs.

i also used https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/models/core/multimodal/build_multimodal_engine.py to convert model, but it does not support --model_type qwen2_5_vl

Jun 04 '25 02:06 st-wps

Hi, looks like you're still following the classical TRT codepath. I don't see a mention of qwen2_5_vl in multimodal_model_builder.py. So, I don't think it's supported there.

Can you please try out the pytorch codepath following the links I shared in my last comment?

Jun 04 '25 15:06 brb-nv

Hi, looks like you're still following the classical TRT codepath. I don't see a mention of qwen2_5_vl in multimodal_model_builder.py. So, I don't think it's supported there.

Can you please try out the pytorch codepath following the links I shared in my last comment?

Hi, Thank you for your suggestion. I can start the model directly by

trtllm-serve model_path \
    --backend pytorch

Jun 05 '25 01:06 st-wps

Hi, looks like you're still following the classical TRT codepath. I don't see a mention of qwen2_5_vl in multimodal_model_builder.py. So, I don't think it's supported there. Can you please try out the pytorch codepath following the links I shared in my last comment?

Hi, Thank you for your suggestion. I can start the model directly by
trtllm-serve model_path \
    --backend pytorch

Hi, Do you know how to send a request to the http service? I tried this method, but it didn't work.

response = client.chat.completions.create(
    model="Qwen2.5-VL-7B-Instruct",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "file:///demo.jpeg",
            },
            {"type": "text", "text": "Describe this image."},
        ]
    }],
    max_tokens=20,
)

Jun 06 '25 08:06 Mitty-ZH

Hi, looks like you're still following the classical TRT codepath. I don't see a mention of qwen2_5_vl in multimodal_model_builder.py. So, I don't think it's supported there. Can you please try out the pytorch codepath following the links I shared in my last comment?

Hi, Thank you for your suggestion. I can start the model directly by
trtllm-serve model_path \
    --backend pytorch
Hi, Do you know how to send a request to the http service? I tried this method, but it didn't work.
response = client.chat.completions.create(
    model="Qwen2.5-VL-7B-Instruct",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "file:///demo.jpeg",
            },
            {"type": "text", "text": "Describe this image."},
        ]
    }],
    max_tokens=20,
)

I input the picture through the image_url. You can convert the local picture to base64 format for use.

{
  "model": "Qwen2.5-VL-7B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD..."}}
      ]
    }
  ]
}

Jun 06 '25 08:06 st-wps

[ { "role": "user", "content": [ {"type": "text", "text": "What is in this image?"}, {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD..."}} ] } ]

yeah, I also tried base64 format, but it reported openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'image_url is not supported', 'type': 'BadRequestError', 'param': None, 'code': 400}

Jun 06 '25 08:06 Mitty-ZH

[ { "role": "user", "content": [ {"type": "text", "text": "What is in this image?"}, {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD..."}} ] } ]

yeah, I also tried base64 format, but it reported openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'image_url is not supported', 'type': 'BadRequestError', 'param': None, 'code': 400}

me too...

Jun 17 '25 05:06 joonb14

[ { "role": "user", "content": [ {"type": "text", "text": "What is in this image?"}, {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD..."}} ] } ]

yeah, I also tried base64 format, but it reported openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'image_url is not supported', 'type': 'BadRequestError', 'param': None, 'code': 400}

hi, It's a problem with my expression. when using pytorch as backend, I get the same error as you. but it works well when offline inference. ps: conversion needs to be carried out first before offline inference pps : there will be problems when deploying serve using the conversion model

Jun 17 '25 09:06 st-wps

TensorRT-LLM TensorRT-LLM copied to clipboard

convert qwen2.5-VL fail

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

TensorRT-LLM
TensorRT-LLM copied to clipboard