onnxruntime_backend
onnxruntime_backend copied to clipboard
The Triton backend for the ONNX Runtime.
Tested with OV 2024.X. This PR should be merged after ORT upgraded to 1.18.
**Description** In ONNXRuntime, the OpenVINO EP accepts configuration options to set the number of threads and number of streams documented [here](https://onnxruntime.ai/docs/execution-providers/OpenVINO-ExecutionProvider.html#cc-api-20), but these are ignored when passed to the EP...
I followed the README instructions on compilation and at the end I faced the `UNAVAILABLE: Unsupported: Triton TRITONBACKEND API version: 1.16 does not support 'onnxruntime' TRITONBACKEND API version: 1.19` error...
Bringing this to `main` branch as well since current main pipelines are targeting CUDA 12.5
server: https://github.com/triton-inference-server/server/pull/7717
**Description** When deploying an ONNX model using the Triton Inference Server's ONNX runtime backend, the inference performance on the CPU is noticeably slower compared to running the same model using...
Description I'm trying to deploy text to speech model with onnx and triton. When running the server, I get this error: failed:Protobuf parsing failed. also model status is : UNAVAILABLE:...
Hi! > I wasn't sure whether to place this under bug or whether it works as intended I'm currently facing an issue where a model, deployed via Triton ONNX Backend,...
**Description** onnxruntime backend load model fail will cause gpu memory leak **Triton Information** r23.12 and r24.07 Are you using the Triton container or did you build it yourself? use nvcr.io/nvidia/tritonserver:r23.12-py3...