NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

Support Canary-Qwen-2.5b inference in vLLM and TensorRT

Open vmazalov opened this issue 4 months ago • 4 comments

Are there plans to support the model inference in vLLM and/or TensorRT?

vmazalov avatar Aug 21 '25 00:08 vmazalov

+1

motapinto avatar Sep 04 '25 14:09 motapinto

ai generated, please verify

Based on my analysis of the NeMo codebase, there is currently no explicit support for Canary-Qwen-2.5b model inference in either vLLM or TensorRT.

While NeMo supports Qwen2.5-VL models (3B, 7B, 32B, 72B) for standard inference and provides export capabilities to Hugging Face format, there is no direct integration with vLLM or TensorRT specifically for these models.

The multimodal TensorRT export functionality in NeMo currently supports other models like NEVA, Video-NEVA, LITA, VILA, VITA, and SALM, but Qwen models are not included in this list.

For now, you can use the standard NeMo inference pipeline for Qwen2.5-VL models or export them to Hugging Face format, but direct vLLM and TensorRT support appears to be unavailable.

zhenyih avatar Sep 04 '25 17:09 zhenyih

+1

Anderstask1 avatar Nov 18 '25 11:11 Anderstask1

+1 Would like to see support for exporting speech-based models to vLLM! :)

test-dan-run avatar Dec 02 '25 01:12 test-dan-run