server Multimodal support for OpenAI-Compatible frontend

Is your feature request related to a problem? Please describe. I am trying to profile a multimodal model with genai-perf. My model is not OpenAI-compatible by default so I am launching it through Triton's OpenAI-Compatible frontend. It appears that this frontedn can not handle multimodal input for example images or image_url.

Describe the solution you'd like It would be nice if the OpenAI-Compatible frontend support multimodal input as well.

Describe alternatives you've considered If possible, let me know how I can change the frontend's code to enable multimodal support.

Thank you!

May 15 '25 00:05 nzarif

I'm currently using the Qwen2.5-VL model in a single-node, single-process environment, with 8 H20 GPUs on one machine. I want to deploy the model on Triton, with each GPU loading one instance of the model using vLLM as the backend. The client sends HTTP requests using the OpenAI API format, so I also hope that Triton's frontend can support multimodal input (since I'm working on autonomous driving inference, which involves image data). However, at the moment, I've only managed to get single-text inference working. It seems that multimodal support is not currently available.

May 21 '25 02:05 KarlDe1

seek for help @deadeyegoodwin @GuanLuo @tanmayv25, thanks!!!

May 21 '25 02:05 KarlDe1

#8216

May 21 '25 09:05 KarlDe1

Is there anyone who can help take a look?

Jul 17 '25 09:07 soulseen