Kris Hung
Kris Hung
Hi @aohorodnyk , could you please share the command that you run for GRPC interface? Besides, a minimal reproducer would be really helpful for us to investigate this issue.
Hi team, I was wondering if we have any update on this issue?
Hi @pranavsharma, just wanted to follow up and see if we have any update on this, thank you!
@gedoensmax I think using cuda graph indeed helps with the performance. I wasn't able to run the model used by RIVA team due to the issue ``` onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] :...
We were able to resolve the performance regression by setting the value of cudnn_conv_use_max_workspace to 0 after this PR provides the flexibility to do so in the Triton onnxruntime backend:...
Hi @burling, thanks for filing the issue. Could you please provide the repro steps and the model files so that we can investigate further? I think the python backend stub...
@AWallyAllah Could you please share the model file with us so that we can further investigate?
Closing due to lack of activity. Please re-open the issue and provide the model files if you would like to follow up with this issue.
Hi @allan-navarro, Triton's `-py3-igpu*` containers should support Jetpack 6.X. For JP 5.1.1, @nv-kmcgill53 do you know we have supported contaienrs?
Hi @AniForU, the custom metrics support in Python backend utilizes the in-process Triton C API under the hood, so you should be able to retrieve the custom metrics by any...