Support Canary-Qwen-2.5b inference in vLLM and TensorRT
Are there plans to support the model inference in vLLM and/or TensorRT?
+1
ai generated, please verify
Based on my analysis of the NeMo codebase, there is currently no explicit support for Canary-Qwen-2.5b model inference in either vLLM or TensorRT.
While NeMo supports Qwen2.5-VL models (3B, 7B, 32B, 72B) for standard inference and provides export capabilities to Hugging Face format, there is no direct integration with vLLM or TensorRT specifically for these models.
The multimodal TensorRT export functionality in NeMo currently supports other models like NEVA, Video-NEVA, LITA, VILA, VITA, and SALM, but Qwen models are not included in this list.
For now, you can use the standard NeMo inference pipeline for Qwen2.5-VL models or export them to Hugging Face format, but direct vLLM and TensorRT support appears to be unavailable.
+1
+1 Would like to see support for exporting speech-based models to vLLM! :)