KarlDe1
Results
2
issues of
KarlDe1
I'm currently using the Qwen2.5-VL model in a single-node, single-process environment, with 8 H20 GPUs on one machine. I want to deploy the model on Triton, with each GPU loading...
**Describe the issue** I’m implementing a TensorRT plugin for SDPA with BF16 input/output. My goal is to build the compute graph only once during initialization, so I placed the graph-construction...