onnxruntime_backend issues

Will onxxruntime backend support INT8 on cpu ?

1

Hi, we are trying to quantise our onnx models to int8 to run on cpu using : https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#quantization-on-gpu we are using dynamic quantisation , and banking on AVX2 and AVX512...

bharadwajymg

Enable "trt_build_heuristics_enable" optimization for onnxruntime-TensorRT

2

**OnnxRuntime have support for trt_build_heuristics_enable with TensorRT optimization** We observed that some of the inference request take extremely long time, when the user traffic change, without using the TensorRT optimization,...

tobaiMS

Request for Supporting minShapes/optShapes/maxShapes for TensorRT

1

**Is your feature request related to a problem? Please describe.** The ONNX Runtime backend in Triton Inference Server lacks direct support for minShapes, optShapes, and maxShapes in the model configuration...

teith

Question: Does ONNX-RT silently fallbacks to CPU?

1

**Context** Hey, I'm setting up a python backend and I am using `dlpack` to keep the tensors on GPU. As described in [its](https://github.com/triton-inference-server/python_backend?tab=readme-ov-file#input-tensor-device-placement) doc, the tensor will be in either...

Thytu

Onnxruntime backend error when workload is high since Triton uses CUDA 12

5

**Description** When workload is high, some model in Triton ONNXRUNTIME backend will fail. And after it fails, it will never succeed again. Failures will look like: ``` "[StatusCode.INTERNAL] onnx runtime...

zeruniverse

bug

Model failed to create because of output dimensions

I am testing 'text_recognition' model with output shape [1,26,37]. ![image](https://github.com/triton-inference-server/onnxruntime_backend/assets/14834787/a90cab84-5f29-4e22-a8bc-768f864acf6f) My config file is set as ``` name: "text_recognition" platform: "onnxruntime_onnx" max_batch_size : 256 input [ { name: "input.1" data_type:...

nyanmn

Support arbitrary options for execution providers

**Is your feature request related to a problem? Please describe.** This will solve many more requests for extending the option catalogue available. It should also reduce maintenance overhead for options...

gedoensmax

Error while Loading YOLOv8 Model with EfficientNMS_TRT Plugin in TRITON

2

Issue Description: I am encountering an error while trying to load a YOLOv8 model with the EfficientNMS_TRT plugin in TRITON. The specific error message I am receiving is: vbnet UNAVAILABLE:...

whitewalker11

how to use onnxruntime profiling in triton

**Is your feature request related to a problem? Please describe.** triton trace api timing only contains total inference time how to get detailed timing like operator level. or kernel level?...

cyh-ustc

onnx disabled optimizers for dropout

5

My model includes `Dropout` module for inference, and when I run my model by `onnxruntime` locally, I will set `disabled_optimizers=["EliminateDropout"]`. And I want to know how can I do that...

zhaozhiming37

onnxruntime_backend
onnxruntime_backend copied to clipboard

Metadata

Will onxxruntime backend support INT8 on cpu ?

Enable "trt_build_heuristics_enable" optimization for onnxruntime-TensorRT

Request for Supporting minShapes/optShapes/maxShapes for TensorRT

Question: Does ONNX-RT silently fallbacks to CPU?

Onnxruntime backend error when workload is high since Triton uses CUDA 12

Model failed to create because of output dimensions

Support arbitrary options for execution providers

Error while Loading YOLOv8 Model with EfficientNMS_TRT Plugin in TRITON

how to use onnxruntime profiling in triton

onnx disabled optimizers for dropout

← Metadata

Owner

Metadata

onnxruntime_backend onnxruntime_backend copied to clipboard

Metadata

← Metadata

Owner

Metadata

onnxruntime_backend
onnxruntime_backend copied to clipboard