onnxruntime_backend
onnxruntime_backend copied to clipboard
The Triton backend for the ONNX Runtime.
**Description** A clear and concise description of what the bug is. Thanks for your good job! When i build the triton server with docker with onnx backend, i meet so...
**Description** I noticed a pattern in CPU utilization when I ran the same GPU model on two VM: - both with 1 T4 GPU, one with 16 cores and one...
**Description** When I enabled max_queue_delay_microseconds to improve the response speed of the model, I found that there were occasional errors. I set max_queue_delay_microseconds to 70000. Then I sent three tensor...
**Description** I am testing tritonserver on the example models fetched using this script: https://github.com/triton-inference-server/server/blob/main/docs/examples/fetch_models.sh triton server is run as follows: ``` export MODEL_PATH=/tmp/tensorrt-inference-server /opt/tritonserver/bin/tritonserver --strict-model-config=false --model-store=$MODEL_PATH/docs/examples/model_repository 2>&1 | tee $MODEL_PATH/svrStatus.txt...
**Description** The OnnxRt-Openvino backend produces the errors when ran with Triton. The error shows up when running the BERT onnx model from the [zoo](https://github.com/winnerineast/models-onnx/blob/master/text/machine_comprehension/bert-squad/model/bertsquad8.onnx). However, when the same model is...
**Description** When Trying to load an Onnx model with auto generated config file, the error was thrown: ``` E1006 22:22:40.180598 23016 model_repository_manager.cc:1186] failed to load 'ads_model' version 1: Invalid argument:...
**Description** The generation looks for ["CUDNN_VERSION" environment variable on host system](https://github.com/triton-inference-server/onnxruntime_backend/blob/main/tools/gen_ort_dockerfile.py#L429-L435) at first, and later use the [version in docker image](https://github.com/triton-inference-server/onnxruntime_backend/blob/main/tools/gen_ort_dockerfile.py#L94-L98). CUDNN ships with the docker image so it may...
Determine what outputs are needed by the requests in the batch and only calculate those (TF backend contains a representative implementation).
Hi, I was wondering if you planned at some point to support the ONNXRuntime extensions detailed on their repo https://github.com/microsoft/onnxruntime-extensions This will allows/unlock a lot of possibilities such as post...
**Description** We're using `--backend-config=onnxruntime,default-max-batch-size=128` to enable large client side batches for all of our models, however we want to limit dynamic batches to a much lower limit for more predictable...