KarlDe1 issues

Repositories
Issues
Comments

Results 2 issues of


                                            KarlDe1

[vLLM backend] Multimodal support for OpenAI-Compatible frontend

I'm currently using the Qwen2.5-VL model in a single-node, single-process environment, with 8 H20 GPUs on one machine. I want to deploy the model on Triton, with each GPU loading...

`configurePlugin` is called repeatedly for my BF16 SDPA plugin — how to run initialization graph only once?

**Describe the issue** I’m implementing a TensorRT plugin for SDPA with BF16 input/output. My goal is to build the compute graph only once during initialization, so I placed the graph-construction...