onnxruntime_backend
onnxruntime_backend copied to clipboard
The Triton backend for the ONNX Runtime.
Use special ORT branch 'tensorrt-8.5ea' brought in by the ORT 1.12.1 release to make use of the built-in tensorrt parser.
For every instance in a model instance group a new ORT session is created. This code adds support to share a session per instance group. This support can be enabled...
**Description** Hi all, recently I encountered an issue when I implemented the onnx model, it would **consume too much memory**. Please check the [issue](https://github.com/microsoft/onnxruntime/issues/1725) that it seems a feature of...
I am trying to use MMpose in the Nvidia triton server but it does not support PyTorch model, it supports torchscript and ONNX, and a few others. So, I have...
Background: My onnx model include `Dropout`, which is executed as `training` mode.However, onnxruntime will optimize `Dropout` ops by default.So, I calls `session = ort.InferenceSession(modelPath, disabled_optimizers=["EliminateDropout"])` to avoid that. Question: What...
**Is your feature request related to a problem? Please describe.** I would like to use the Intel oneDNN Execution Provider (EP) in ONNX Runtime built for Triton Inference Server ONNX...
**Description** Using the same model as in #102, the Triton Inference Server has a memory leak, as observed by `docker stats`, after adding: ``` execution_accelerators { cpu_execution_accelerator : [ {...
**Description** I was unable to build the onnxruntime_backend with OpenVino for Triton Inference Server r22.03 using compatible ONNXRuntime and tensorrt versions (from Triton Inference Server compatibility matrix). **Triton Information** r22.03...
When using onnx with tensorrt it saves a lot of time to use the tensorrt cache path. The drawback is that onnxruntime is not smart enough to avoid using the...