KarlDe1

Results 2 issues of KarlDe1

I'm currently using the Qwen2.5-VL model in a single-node, single-process environment, with 8 H20 GPUs on one machine. I want to deploy the model on Triton, with each GPU loading...

**Describe the issue** I’m implementing a TensorRT plugin for SDPA with BF16 input/output. My goal is to build the compute graph only once during initialization, so I placed the graph-construction...