server
server copied to clipboard
Triton Server OpenVINO backend not working with Tensorflow saved models
Description Triton is unable to load models with Tensorflow saved model format with OpenVINO backend.
Triton Information What version of Triton are you using? 23.10,23.11,23.12,24.03,24.04 don't work.
Are you using the Triton container or did you build it yourself? Triton container
To Reproduce Basically follow: https://github.com/triton-inference-server/tutorials/tree/main/Quick_Deploy/TensorFlow but change backend to OpenVINO
Model config:
name: "resnet50"
backend: "openvino"
#platform: "tensorflow_savedmodel"
default_model_filename: "model.saved_model"
max_batch_size : 0
input [
{
name: "input_1"
data_type: TYPE_FP32
dims: [-1, 224, 224, 3 ]
}
]
output [
{
name: "predictions"
data_type: TYPE_FP32
dims: [-1, 1000]
}
]
Command and logs:
docker run --rm -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.03-py3 tritonserver --model-repository=/models #docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.03-py3 tritonserver --model-repository=/models
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 24.03 (build 86102629)
Triton Server Version 2.44.0
Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available.
Use the NVIDIA Container Toolkit to start this container with GPU support; see
https://docs.nvidia.com/datacenter/cloud-native/ .
W0509 14:21:08.011250 1 pinned_memory_manager.cc:271] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I0509 14:21:08.011288 1 cuda_memory_manager.cc:117] CUDA memory pool disabled
E0509 14:21:08.011338 1 server.cc:243] CudaDriverHelper has not been initialized.
I0509 14:21:08.013345 1 model_lifecycle.cc:469] loading: resnet50:1
I0509 14:21:08.022993 1 openvino.cc:1373] TRITONBACKEND_Initialize: openvino
I0509 14:21:08.023009 1 openvino.cc:1383] Triton TRITONBACKEND API version: 1.19
I0509 14:21:08.023012 1 openvino.cc:1389] 'openvino' TRITONBACKEND API version: 1.19
I0509 14:21:08.023059 1 openvino.cc:1473] TRITONBACKEND_ModelInitialize: resnet50 (version 1)
terminate called after throwing an instance of 'triton::backend::BackendModelInstanceException'
Expected behavior Load the model.
I found out that in the Triton image there are 2 versions of OpenVINO, and one of them is missing libraries from OpenVINO:
root@8bc8eab2d6ce:/# find -name "*openvino*" | grep -v 2330 | grep -v 23\.3\.0 | grep -v LICENSE | grep -v "libopenvino_c\|libopenvino.so"
./opt/tritonserver/backends/openvino
./opt/tritonserver/backends/openvino/libopenvino_intel_gna_plugin.so
./opt/tritonserver/backends/openvino/libopenvino_tensorflow_lite_frontend.so
./opt/tritonserver/backends/openvino/libtriton_openvino.so
./opt/tritonserver/backends/openvino/libopenvino_onnx_frontend.so
./opt/tritonserver/backends/openvino/libopenvino_auto_batch_plugin.so
./opt/tritonserver/backends/openvino/libopenvino_pytorch_frontend.so
./opt/tritonserver/backends/openvino/libopenvino_paddle_frontend.so
./opt/tritonserver/backends/openvino/libopenvino_intel_gpu_plugin.so
./opt/tritonserver/backends/openvino/libopenvino_tensorflow_frontend.so
./opt/tritonserver/backends/openvino/libopenvino_gapi_preproc.so
./opt/tritonserver/backends/openvino/libopenvino_auto_plugin.so
./opt/tritonserver/backends/openvino/libopenvino_hetero_plugin.so
./opt/tritonserver/backends/openvino/libopenvino_intel_cpu_plugin.so
./opt/tritonserver/backends/onnxruntime/libopenvino_onnx_frontend.so
./opt/tritonserver/backends/onnxruntime/libonnxruntime_providers_openvino.so
./opt/tritonserver/backends/onnxruntime/libopenvino_ir_frontend.so
./opt/tritonserver/backends/onnxruntime/libopenvino_intel_cpu_plugin.so
./opt/tritonserver/backends/onnxruntime/libopenvino_tensorflow_frontend.so
So this problem most likely affects also TF Lite, PaddlePaddle & Pytorch model formats.
Culprit is most likely here: https://github.com/triton-inference-server/onnxruntime_backend/blob/48cc4f132a451a8dfebe501583d88acb5243dc38/tools/gen_ort_dockerfile.py#L311 as not all libraries are copied.
@tanmayv25 for vis.
@atobiszei The openVINO backend in Triton does not support models saved in savedModel format. Read about Triton's OpenVINO backend here: https://github.com/triton-inference-server/openvino_backend?tab=readme-ov-file#openvino-backend
You'd have to convert savedModel using model optimizer tool into OpenVINO IR model (.xml and .bin files). Then place these files into the model directory instead of TF savedmodel dir.
@tanmayv25 This paragraph states otherwise: https://github.com/triton-inference-server/openvino_backend#loading-non-default-model-format.
When I removed ONNX backend from Triton image & tuned shape parameters in config it worked fine.
Thanks for the correction. It seems the feature to load savedmodel has been added recently. We need to revisit the Triton image to make sure that there are no conflicting dependencies. The openVINO backend should be using its own installation of openVINO library instead of the one held in onnxruntime.
This could also help us installing different OV between OV and ONNXRuntime backends.