onnxruntime_backend icon indicating copy to clipboard operation
onnxruntime_backend copied to clipboard

The Triton backend for the ONNX Runtime.

Results 81 onnxruntime_backend issues
Sort by recently updated
recently updated
newest added

Tested with OV 2024.X. This PR should be merged after ORT upgraded to 1.18.

**Description** In ONNXRuntime, the OpenVINO EP accepts configuration options to set the number of threads and number of streams documented [here](https://onnxruntime.ai/docs/execution-providers/OpenVINO-ExecutionProvider.html#cc-api-20), but these are ignored when passed to the EP...

I followed the README instructions on compilation and at the end I faced the `UNAVAILABLE: Unsupported: Triton TRITONBACKEND API version: 1.16 does not support 'onnxruntime' TRITONBACKEND API version: 1.19` error...

Bringing this to `main` branch as well since current main pipelines are targeting CUDA 12.5

server: https://github.com/triton-inference-server/server/pull/7717

**Description** When deploying an ONNX model using the Triton Inference Server's ONNX runtime backend, the inference performance on the CPU is noticeably slower compared to running the same model using...

Description I'm trying to deploy text to speech model with onnx and triton. When running the server, I get this error: failed:Protobuf parsing failed. also model status is : UNAVAILABLE:...

Hi! > I wasn't sure whether to place this under bug or whether it works as intended I'm currently facing an issue where a model, deployed via Triton ONNX Backend,...

**Description** onnxruntime backend load model fail will cause gpu memory leak **Triton Information** r23.12 and r24.07 Are you using the Triton container or did you build it yourself? use nvcr.io/nvidia/tritonserver:r23.12-py3...