server Triton Server OpenVINO backend not working with Tensorflow saved models

Description Triton is unable to load models with Tensorflow saved model format with OpenVINO backend.

Triton Information What version of Triton are you using? 23.10,23.11,23.12,24.03,24.04 don't work.

Are you using the Triton container or did you build it yourself? Triton container

To Reproduce Basically follow: https://github.com/triton-inference-server/tutorials/tree/main/Quick_Deploy/TensorFlow but change backend to OpenVINO

Model config:

name: "resnet50"
backend: "openvino"
#platform: "tensorflow_savedmodel"
default_model_filename: "model.saved_model"
max_batch_size : 0
input [
  {
    name: "input_1"
    data_type: TYPE_FP32
    dims: [-1, 224, 224, 3 ]
  }
]
output [
  {
    name: "predictions"
    data_type: TYPE_FP32
    dims: [-1, 1000]
  }
]

Command and logs:

docker run --rm -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.03-py3 tritonserver --model-repository=/models #docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:24.03-py3 tritonserver --model-repository=/models


=============================
== Triton Inference Server ==
=============================

NVIDIA Release 24.03 (build 86102629)
Triton Server Version 2.44.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

W0509 14:21:08.011250 1 pinned_memory_manager.cc:271] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I0509 14:21:08.011288 1 cuda_memory_manager.cc:117] CUDA memory pool disabled
E0509 14:21:08.011338 1 server.cc:243] CudaDriverHelper has not been initialized.
I0509 14:21:08.013345 1 model_lifecycle.cc:469] loading: resnet50:1
I0509 14:21:08.022993 1 openvino.cc:1373] TRITONBACKEND_Initialize: openvino
I0509 14:21:08.023009 1 openvino.cc:1383] Triton TRITONBACKEND API version: 1.19
I0509 14:21:08.023012 1 openvino.cc:1389] 'openvino' TRITONBACKEND API version: 1.19
I0509 14:21:08.023059 1 openvino.cc:1473] TRITONBACKEND_ModelInitialize: resnet50 (version 1)
terminate called after throwing an instance of 'triton::backend::BackendModelInstanceException'

Expected behavior Load the model.

May 09 '24 14:05 atobiszei

I found out that in the Triton image there are 2 versions of OpenVINO, and one of them is missing libraries from OpenVINO:

root@8bc8eab2d6ce:/# find -name "*openvino*" | grep -v 2330 | grep -v 23\.3\.0 | grep -v LICENSE | grep -v "libopenvino_c\|libopenvino.so"

./opt/tritonserver/backends/openvino
./opt/tritonserver/backends/openvino/libopenvino_intel_gna_plugin.so
./opt/tritonserver/backends/openvino/libopenvino_tensorflow_lite_frontend.so
./opt/tritonserver/backends/openvino/libtriton_openvino.so
./opt/tritonserver/backends/openvino/libopenvino_onnx_frontend.so
./opt/tritonserver/backends/openvino/libopenvino_auto_batch_plugin.so
./opt/tritonserver/backends/openvino/libopenvino_pytorch_frontend.so
./opt/tritonserver/backends/openvino/libopenvino_paddle_frontend.so
./opt/tritonserver/backends/openvino/libopenvino_intel_gpu_plugin.so
./opt/tritonserver/backends/openvino/libopenvino_tensorflow_frontend.so
./opt/tritonserver/backends/openvino/libopenvino_gapi_preproc.so
./opt/tritonserver/backends/openvino/libopenvino_auto_plugin.so
./opt/tritonserver/backends/openvino/libopenvino_hetero_plugin.so
./opt/tritonserver/backends/openvino/libopenvino_intel_cpu_plugin.so
./opt/tritonserver/backends/onnxruntime/libopenvino_onnx_frontend.so
./opt/tritonserver/backends/onnxruntime/libonnxruntime_providers_openvino.so
./opt/tritonserver/backends/onnxruntime/libopenvino_ir_frontend.so
./opt/tritonserver/backends/onnxruntime/libopenvino_intel_cpu_plugin.so
./opt/tritonserver/backends/onnxruntime/libopenvino_tensorflow_frontend.so

So this problem most likely affects also TF Lite, PaddlePaddle & Pytorch model formats.

Culprit is most likely here: https://github.com/triton-inference-server/onnxruntime_backend/blob/48cc4f132a451a8dfebe501583d88acb5243dc38/tools/gen_ort_dockerfile.py#L311 as not all libraries are copied.

May 09 '24 14:05 atobiszei

@tanmayv25 for vis.

May 09 '24 21:05 krishung5

@atobiszei The openVINO backend in Triton does not support models saved in savedModel format. Read about Triton's OpenVINO backend here: https://github.com/triton-inference-server/openvino_backend?tab=readme-ov-file#openvino-backend

You'd have to convert savedModel using model optimizer tool into OpenVINO IR model (.xml and .bin files). Then place these files into the model directory instead of TF savedmodel dir.

May 10 '24 01:05 tanmayv25

@tanmayv25 This paragraph states otherwise: https://github.com/triton-inference-server/openvino_backend#loading-non-default-model-format.

When I removed ONNX backend from Triton image & tuned shape parameters in config it worked fine.

May 10 '24 14:05 atobiszei

Thanks for the correction. It seems the feature to load savedmodel has been added recently. We need to revisit the Triton image to make sure that there are no conflicting dependencies. The openVINO backend should be using its own installation of openVINO library instead of the one held in onnxruntime.

This could also help us installing different OV between OV and ONNXRuntime backends.

May 10 '24 18:05 tanmayv25

server server copied to clipboard

Triton Server OpenVINO backend not working with Tensorflow saved models

server
server copied to clipboard