onnxruntime_backend icon indicating copy to clipboard operation
onnxruntime_backend copied to clipboard

The Triton backend for the ONNX Runtime.

Results 81 onnxruntime_backend issues
Sort by recently updated
recently updated
newest added

Hi, we are trying to quantise our onnx models to int8 to run on cpu using : https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#quantization-on-gpu we are using dynamic quantisation , and banking on AVX2 and AVX512...

**OnnxRuntime have support for trt_build_heuristics_enable with TensorRT optimization** We observed that some of the inference request take extremely long time, when the user traffic change, without using the TensorRT optimization,...

**Is your feature request related to a problem? Please describe.** The ONNX Runtime backend in Triton Inference Server lacks direct support for minShapes, optShapes, and maxShapes in the model configuration...

**Context** Hey, I'm setting up a python backend and I am using `dlpack` to keep the tensors on GPU. As described in [its](https://github.com/triton-inference-server/python_backend?tab=readme-ov-file#input-tensor-device-placement) doc, the tensor will be in either...

**Description** When workload is high, some model in Triton ONNXRUNTIME backend will fail. And after it fails, it will never succeed again. Failures will look like: ``` "[StatusCode.INTERNAL] onnx runtime...

bug

I am testing 'text_recognition' model with output shape [1,26,37]. ![image](https://github.com/triton-inference-server/onnxruntime_backend/assets/14834787/a90cab84-5f29-4e22-a8bc-768f864acf6f) My config file is set as ``` name: "text_recognition" platform: "onnxruntime_onnx" max_batch_size : 256 input [ { name: "input.1" data_type:...

**Is your feature request related to a problem? Please describe.** This will solve many more requests for extending the option catalogue available. It should also reduce maintenance overhead for options...

Issue Description: I am encountering an error while trying to load a YOLOv8 model with the EfficientNMS_TRT plugin in TRITON. The specific error message I am receiving is: vbnet UNAVAILABLE:...

**Is your feature request related to a problem? Please describe.** triton trace api timing only contains total inference time how to get detailed timing like operator level. or kernel level?...

My model includes `Dropout` module for inference, and when I run my model by `onnxruntime` locally, I will set `disabled_optimizers=["EliminateDropout"]`. And I want to know how can I do that...