NeMo Support Canary-Qwen-2.5b inference in vLLM and TensorRT

Are there plans to support the model inference in vLLM and/or TensorRT?

Aug 21 '25 00:08 vmazalov

+1

Sep 04 '25 14:09 motapinto

ai generated, please verify

Based on my analysis of the NeMo codebase, there is currently no explicit support for Canary-Qwen-2.5b model inference in either vLLM or TensorRT.

While NeMo supports Qwen2.5-VL models (3B, 7B, 32B, 72B) for standard inference and provides export capabilities to Hugging Face format, there is no direct integration with vLLM or TensorRT specifically for these models.

The multimodal TensorRT export functionality in NeMo currently supports other models like NEVA, Video-NEVA, LITA, VILA, VITA, and SALM, but Qwen models are not included in this list.

For now, you can use the standard NeMo inference pipeline for Qwen2.5-VL models or export them to Hugging Face format, but direct vLLM and TensorRT support appears to be unavailable.

Sep 04 '25 17:09 zhenyih

+1

Nov 18 '25 11:11 Anderstask1

+1 Would like to see support for exporting speech-based models to vLLM! :)

Dec 02 '25 01:12 test-dan-run