onnxruntime_backend
onnxruntime_backend copied to clipboard
The Triton backend for the ONNX Runtime.
Hi, we are trying to quantise our onnx models to int8 to run on cpu using : https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#quantization-on-gpu we are using dynamic quantisation , and banking on AVX2 and AVX512...
**OnnxRuntime have support for trt_build_heuristics_enable with TensorRT optimization** We observed that some of the inference request take extremely long time, when the user traffic change, without using the TensorRT optimization,...
**Is your feature request related to a problem? Please describe.** The ONNX Runtime backend in Triton Inference Server lacks direct support for minShapes, optShapes, and maxShapes in the model configuration...
**Context** Hey, I'm setting up a python backend and I am using `dlpack` to keep the tensors on GPU. As described in [its](https://github.com/triton-inference-server/python_backend?tab=readme-ov-file#input-tensor-device-placement) doc, the tensor will be in either...
**Description** When workload is high, some model in Triton ONNXRUNTIME backend will fail. And after it fails, it will never succeed again. Failures will look like: ``` "[StatusCode.INTERNAL] onnx runtime...
I am testing 'text_recognition' model with output shape [1,26,37].  My config file is set as ``` name: "text_recognition" platform: "onnxruntime_onnx" max_batch_size : 256 input [ { name: "input.1" data_type:...
**Is your feature request related to a problem? Please describe.** This will solve many more requests for extending the option catalogue available. It should also reduce maintenance overhead for options...
Issue Description: I am encountering an error while trying to load a YOLOv8 model with the EfficientNMS_TRT plugin in TRITON. The specific error message I am receiving is: vbnet UNAVAILABLE:...
**Is your feature request related to a problem? Please describe.** triton trace api timing only contains total inference time how to get detailed timing like operator level. or kernel level?...
My model includes `Dropout` module for inference, and when I run my model by `onnxruntime` locally, I will set `disabled_optimizers=["EliminateDropout"]`. And I want to know how can I do that...