onnxruntime_backend issues

Bug for Built target onnxruntime_providers

**Description** A clear and concise description of what the bug is. Thanks for your good job! When i build the triton server with docker with onnx backend, i meet so...

SimonLliu

Half of CPU threads not utilized when running GPU model

**Description** I noticed a pattern in CPU utilization when I ran the same GPU model on two VM: - both with 1 T4 GPU, one with 16 cores and one...

wilsoncai1992

tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] onnx runtime error 2: not enough space: expected 270080, got 261760

7

**Description** When I enabled max_queue_delay_microseconds to improve the response speed of the model, I found that there were occasional errors. I set max_queue_delay_microseconds to 70000. Then I sent three tensor...

piekey1994

Model loading failure: densenet_onnx fails to load due to "pthread_setaffinity_np" failure

4

**Description** I am testing tritonserver on the example models fetched using this script: https://github.com/triton-inference-server/server/blob/main/docs/examples/fetch_models.sh triton server is run as follows: ``` export MODEL_PATH=/tmp/tensorrt-inference-server /opt/tritonserver/bin/tritonserver --strict-model-config=false --model-store=$MODEL_PATH/docs/examples/model_repository 2>&1 | tee $MODEL_PATH/svrStatus.txt...

shrek

Error in onnxruntime-openvino backend when run with Triton

7

**Description** The OnnxRt-Openvino backend produces the errors when ran with Triton. The error shows up when running the BERT onnx model from the [zoo](https://github.com/winnerineast/models-onnx/blob/master/text/machine_comprehension/bert-squad/model/bertsquad8.onnx). However, when the same model is...

mayani-nv

Model Loading failure: Invalid argument: model output cannot have empty reshape for non-batching model for test_model

10

**Description** When Trying to load an Onnx model with auto generated config file, the error was thrown: ``` E1006 22:22:40.180598 23016 model_repository_manager.cc:1186] failed to load 'ads_model' version 1: Invalid argument:...

supercharleszhu

more-info-needed

In Dockerfile gen script, CUDNN_VERSION should be obtained from docker image

**Description** The generation looks for ["CUDNN_VERSION" environment variable on host system](https://github.com/triton-inference-server/onnxruntime_backend/blob/main/tools/gen_ort_dockerfile.py#L429-L435) at first, and later use the [version in docker image](https://github.com/triton-inference-server/onnxruntime_backend/blob/main/tools/gen_ort_dockerfile.py#L94-L98). CUDNN ships with the docker image so it may...

GuanLuo

Don't always calculate all outputs.

Determine what outputs are needed by the requests in the batch and only calculate those (TF backend contains a representative implementation).

DavidLangworthy

Add ONNXRuntime extensions support

4

Hi, I was wondering if you planned at some point to support the ONNXRuntime extensions detailed on their repo https://github.com/microsoft/onnxruntime-extensions This will allows/unlock a lot of possibilities such as post...

jplu

default-max-batch-size doesn't cooperate well with preferred_batch_size

**Description** We're using `--backend-config=onnxruntime,default-max-batch-size=128` to enable large client side batches for all of our models, however we want to limit dynamic batches to a much lower limit for more predictable...

OvervCW

onnxruntime_backend
onnxruntime_backend copied to clipboard

Metadata

Bug for Built target onnxruntime_providers

Half of CPU threads not utilized when running GPU model

tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] onnx runtime error 2: not enough space: expected 270080, got 261760

Model loading failure: densenet_onnx fails to load due to "pthread_setaffinity_np" failure

Error in onnxruntime-openvino backend when run with Triton

Model Loading failure: Invalid argument: model output cannot have empty reshape for non-batching model for test_model

In Dockerfile gen script, CUDNN_VERSION should be obtained from docker image

Don't always calculate all outputs.

Add ONNXRuntime extensions support

default-max-batch-size doesn't cooperate well with preferred_batch_size

← Metadata

Owner

Metadata

onnxruntime_backend onnxruntime_backend copied to clipboard

Metadata

← Metadata

Owner

Metadata

onnxruntime_backend
onnxruntime_backend copied to clipboard