TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

convert qwen2.5-VL fail

Open dzy130120 opened this issue 6 months ago • 1 comments

System Info

x86_64, 128G RTX3090 24G TensorRT-LLM 0.19.0 cuda 12.8.93 host system ubuntu 20.04 host GPU driver 550.144.03 TensorRT 10.9.0.34 cuBLAS 12.8.4.1 CONTAINER ID IMAGE
ec1bbab4b4aa tensorrt_llm/release:latest

Who can help?

@ncomly-nvidia

Information

  • [x] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

  1. make -C docker release_run
  2. root@dzy-3660-release:/app/tensorrt_llm/examples/qwenvl# python3 vit_onnx_trt.py --pretrained_model_path /code/tensorrt_llm/Qwen2.5-VL-3B-Instruct

Image

Expected behavior

run qwen2.5-VL-3B-Instruct and qwen2.5-VL-72B-Instruct with TensorRT-LLM

actual behavior

2025-05-16 01:19:01,052 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend [TensorRT-LLM] TensorRT-LLM version: 0.19.0 Start converting ONNX model! Traceback (most recent call last): File "/app/tensorrt_llm/examples/qwenvl/vit_onnx_trt.py", line 192, in onnx_trt_obj.export_onnx(args.onnxFile, args.pretrained_model_path, File "/app/tensorrt_llm/examples/qwenvl/vit_onnx_trt.py", line 66, in export_onnx model = AutoModelForCausalLM.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/auto_factory.py", line 574, in from_pretrained raise ValueError( ValueError: Unrecognized configuration class <class 'transformers.models.qwen2_5_vl.configuration_qwen2_5_vl.Qwen2_5_VLConfig'> for this kind of AutoModel: AutoModelForCausalLM. Model type should be one of AriaTextConfig, BambaConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BloomConfig, CamembertConfig, LlamaConfig, CodeGenConfig, CohereConfig, Cohere2Config, CpmAntConfig, CTRLConfig, Data2VecTextConfig, DbrxConfig, DeepseekV3Config, DiffLlamaConfig, ElectraConfig, Emu3Config, ErnieConfig, FalconConfig, FalconMambaConfig, FuyuConfig, GemmaConfig, Gemma2Config, Gemma3Config, Gemma3TextConfig, GitConfig, GlmConfig, GotOcr2Config, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, GraniteConfig, GraniteMoeConfig, GraniteMoeSharedConfig, HeliumConfig, JambaConfig, JetMoeConfig, LlamaConfig, Llama4Config, Llama4TextConfig, MambaConfig, Mamba2Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MllamaConfig, MoshiConfig, MptConfig, MusicgenConfig, MusicgenMelodyConfig, MvpConfig, NemotronConfig, OlmoConfig, Olmo2Config, OlmoeConfig, OpenLlamaConfig, OpenAIGPTConfig, OPTConfig, PegasusConfig, PersimmonConfig, PhiConfig, Phi3Config, Phi4MultimodalConfig, PhimoeConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, Qwen3Config, Qwen3MoeConfig, RecurrentGemmaConfig, ReformerConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, Speech2Text2Config, StableLmConfig, Starcoder2Config, TransfoXLConfig, TrOCRConfig, WhisperConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig, ZambaConfig, Zamba2Config.

additional notes

shoud use Qwen2_5_VLForConditionalGeneration to load model

dzy130120 avatar May 16 '25 11:05 dzy130120

I'm guessing you're using the CPP workflow? It's supported in our Pytorch workflow. Please give it a shot:

  • https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/pytorch#supported-models
  • https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/_torch/models/modeling_qwen2vl.py#L453

brb-nv avatar May 16 '25 23:05 brb-nv

could you please provide me with the model conversion script of qwen 2.5 VL? when using https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/models/core/qwenvl/vit_onnx_trt.py , errors will occur when exporting onnx graphs and trt graphs.

i also used https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/models/core/multimodal/build_multimodal_engine.py to convert model, but it does not support --model_type qwen2_5_vl

st-wps avatar Jun 04 '25 02:06 st-wps

Hi, looks like you're still following the classical TRT codepath. I don't see a mention of qwen2_5_vl in multimodal_model_builder.py. So, I don't think it's supported there.

Can you please try out the pytorch codepath following the links I shared in my last comment?

brb-nv avatar Jun 04 '25 15:06 brb-nv

Hi, looks like you're still following the classical TRT codepath. I don't see a mention of qwen2_5_vl in multimodal_model_builder.py. So, I don't think it's supported there.

Can you please try out the pytorch codepath following the links I shared in my last comment?

Hi, Thank you for your suggestion. I can start the model directly by

trtllm-serve model_path \
    --backend pytorch

st-wps avatar Jun 05 '25 01:06 st-wps

Hi, looks like you're still following the classical TRT codepath. I don't see a mention of qwen2_5_vl in multimodal_model_builder.py. So, I don't think it's supported there. Can you please try out the pytorch codepath following the links I shared in my last comment?

Hi, Thank you for your suggestion. I can start the model directly by

trtllm-serve model_path \
    --backend pytorch

Hi, Do you know how to send a request to the http service? I tried this method, but it didn't work.

response = client.chat.completions.create(
    model="Qwen2.5-VL-7B-Instruct",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "file:///demo.jpeg",
            },
            {"type": "text", "text": "Describe this image."},
        ]
    }],
    max_tokens=20,
)

Mitty-ZH avatar Jun 06 '25 08:06 Mitty-ZH

Hi, looks like you're still following the classical TRT codepath. I don't see a mention of qwen2_5_vl in multimodal_model_builder.py. So, I don't think it's supported there. Can you please try out the pytorch codepath following the links I shared in my last comment?

Hi, Thank you for your suggestion. I can start the model directly by

trtllm-serve model_path \
    --backend pytorch

Hi, Do you know how to send a request to the http service? I tried this method, but it didn't work.

response = client.chat.completions.create(
    model="Qwen2.5-VL-7B-Instruct",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "file:///demo.jpeg",
            },
            {"type": "text", "text": "Describe this image."},
        ]
    }],
    max_tokens=20,
)

I input the picture through the image_url. You can convert the local picture to base64 format for use.

{
  "model": "Qwen2.5-VL-7B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "..."}}
      ]
    }
  ]
}

st-wps avatar Jun 06 '25 08:06 st-wps

[ { "role": "user", "content": [ {"type": "text", "text": "What is in this image?"}, {"type": "image_url", "image_url": {"url": "..."}} ] } ]

yeah, I also tried base64 format, but it reported openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'image_url is not supported', 'type': 'BadRequestError', 'param': None, 'code': 400}

Mitty-ZH avatar Jun 06 '25 08:06 Mitty-ZH

[ { "role": "user", "content": [ {"type": "text", "text": "What is in this image?"}, {"type": "image_url", "image_url": {"url": "..."}} ] } ]

yeah, I also tried base64 format, but it reported openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'image_url is not supported', 'type': 'BadRequestError', 'param': None, 'code': 400}

me too...

joonb14 avatar Jun 17 '25 05:06 joonb14

[ { "role": "user", "content": [ {"type": "text", "text": "What is in this image?"}, {"type": "image_url", "image_url": {"url": "..."}} ] } ]

yeah, I also tried base64 format, but it reported openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'image_url is not supported', 'type': 'BadRequestError', 'param': None, 'code': 400}

hi, It's a problem with my expression. when using pytorch as backend, I get the same error as you. but it works well when offline inference. ps: conversion needs to be carried out first before offline inference pps : there will be problems when deploying serve using the conversion model

st-wps avatar Jun 17 '25 09:06 st-wps