KarlDe1 comments

Results 7 comments of


                                            KarlDe1

Multimodal support for OpenAI-Compatible frontend

I'm currently using the Qwen2.5-VL model in a single-node, single-process environment, with 8 H20 GPUs on one machine. I want to deploy the model on Triton, with each GPU loading...

Multimodal support for OpenAI-Compatible frontend

seek for help @deadeyegoodwin @GuanLuo @tanmayv25, thanks!!!

Multimodal support for OpenAI-Compatible frontend

#8216

[vLLM backend] Multimodal support for OpenAI-Compatible frontend

Hope the developers can respond actively.

`configurePlugin` is called repeatedly for my BF16 SDPA plugin — how to run initialization graph only once?

Hi @poweiw, My model contains multiple usages of my custom plugin, but all shapes in the model are fixed. At the moment, I am not sure what could be causing...

`configurePlugin` is called repeatedly for my BF16 SDPA plugin — how to run initialization graph only once?

[related_issue](https://github.com/NVIDIA/cudnn-frontend/issues/189)

`configurePlugin` is called repeatedly for my BF16 SDPA plugin — how to run initialization graph only once?

@poweiw Thank for your reply. I have two questions as follows: #### (1) I would like to clarify what **“changed”** means here. Does it mean that `configurePlugin()` will be triggered...