onnxruntime_backend
onnxruntime_backend copied to clipboard
Version dependency still present
Description
I would like to be able te replace the libonnxruntime.so
binary (as well as associated ones) without rebuilding the entire backend, for easier experimentation / testing / debugging. There was a PR by your team a year ago to make this possible, however when I tried to do it failed.
Triton Information What version of Triton are you using? I've tried on the latest release (2.27), the release that had the PR mentionned above (2.17) and a version before that (2.15). The error on that last version was expected, however it wasn't on the following ones.
Are you using the Triton container or did you build it yourself?
I used the triton container, but replaced the binaries for libonnxruntime.so
with the ones distributed by the repo , e.g. : https://github.com/microsoft/onnxruntime/releases/tag/v1.9.0
To Reproduce
For instance, I tried to use the ORT binary for version 1.9.0 with container version above 21.12.
- Download the ORT binary and extract the data from the archive
- Launch Triton, replacing the file in
backends/onnxruntime/libonnxruntime.so
with the dowloaded binary. - Launch the container
I get the following error log with version 21.12:
I1123 15:00:29.369262 20 metrics.cc:298] Collecting metrics for GPU 0: NVIDIA A10G
I1123 15:00:29.369647 20 shared_library.cc:108] OpenLibraryHandle: backends/onnxruntime/libtriton_onnxruntime.so
The given version [10] is not supported, only version 1 to 9 is supported in this build.
I1123 15:00:29.371314 20 onnxruntime.cc:2210] TRITONBACKEND_Initialize: onnxruntime
I1123 15:00:29.371321 20 onnxruntime.cc:2220] Triton TRITONBACKEND API version: 1.7
I1123 15:00:29.371325 20 onnxruntime.cc:2226] 'onnxruntime' TRITONBACKEND API version: 1.7
I1123 15:00:29.371328 20 onnxruntime.cc:2256] backend configuration:
{}
Segmentation fault (core dumped) tritonserver --model-repository models --backend-directory backends $LOG_ARGS
And this with version 22.10:
I1123 15:02:02.200461 22 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f58e0000000' with size 268435456
I1123 15:02:02.201130 22 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I1123 15:02:02.205638 22 model_config_utils.cc:646] Server side auto-completed config: name: "my_model"
platform: "onnxruntime_onnx"
max_batch_size: 32
input {
name: "x"
data_type: TYPE_INT64
dims: -1
}
input {
name: "y"
data_type: TYPE_INT64
dims: -1
}
output {
name: "z"
data_type: TYPE_FP16
dims: -1
dims: 9
}
instance_group {
count: 1
kind: KIND_GPU
}
default_model_filename: "model.onnx"
parameters {
key: "enable_mem_arena"
value {
string_value: "1"
}
}
parameters {
key: "enable_mem_pattern"
value {
string_value: "1"
}
}
parameters {
key: "execution_mode"
value {
string_value: "1"
}
}
parameters {
key: "memory.enable_memory_arena_shrinkage"
value {
string_value: "cpu:0"
}
}
backend: "onnxruntime"
I1123 15:02:02.205739 22 model_lifecycle.cc:459] loading: my_model:1
I1123 15:02:02.205912 22 backend_model.cc:302] Adding default backend config setting: default-max-batch-size,4
I1123 15:02:02.205955 22 shared_library.cc:108] OpenLibraryHandle: backends/onnxruntime/libtriton_onnxruntime.so
The given version [13] is not supported, only version 1 to 9 is supported in this build.
I1123 15:02:02.207689 22 onnxruntime.cc:2459] TRITONBACKEND_Initialize: onnxruntime
I1123 15:02:02.207698 22 onnxruntime.cc:2469] Triton TRITONBACKEND API version: 1.10
I1123 15:02:02.207702 22 onnxruntime.cc:2475] 'onnxruntime' TRITONBACKEND API version: 1.10
I1123 15:02:02.207706 22 onnxruntime.cc:2505] backend configuration:
{"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"backends","default-max-batch-size":"4"}}
Segmentation fault (core dumped) tritonserver --model-repository models --backend-directory backends $LOG_ARGS
Expected behavior
I would like to be able to override the ORT binary without being blocked (apparently) by Triton itself.
This would help me e.g. with this issue, to be able to try to identify if the slowdown I'm experiencing since ORT was bumped to version 1.10.0 is due to to ORT or Triton.
Thank you for filing this detailed issue. We have filed a ticket to investigate this bug.