server Error when doing inference using tf-trt converted frozen model

Description A clear and concise description of what the bug is.

Object detection model trained using tensorflow 1.14. I am able to deploy frozen graph to triton server and perform inference without any issue using tensorflow backend.

After that, Optimized frozen model using tensorrt:21.10-py3 docker image. original frozen graph was 63 MB and converted optimized tf-trt graph is 129 MB.

while deploying converted optimized graph with triton inference server, if I use backend as "tensorrt" and platform as "tensorrt_plan" in config file, it doesn't load model at all.

E0826 14:51:35.815489 1 logging.cc:43] 1: [stdArchiveReader.cpp::StdArchiveReader::29] Error Code 1: Serialization (Serialization assertion magicTagRead == magicT
ag failed.Magic tag does not match)                                                                                                                               
E0826 14:51:35.815519 1 logging.cc:43] 4: [runtime.cpp::deserializeCudaEngine::75] Error Code 4: Internal Error (Engine deserialization failed.)                  
I0826 14:51:35.826583 1 tensorrt.cc:5123] TRITONBACKEND_ModelInstanceFinalize: delete instance state                                                              
I0826 14:51:35.826609 1 tensorrt.cc:5062] TRITONBACKEND_ModelFinalize: delete model state                                                                         
E0826 14:51:35.827106 1 model_repository_manager.cc:1186] failed to load 'my_model' version 1: Internal: unable to create TensorRT engine            
I0826 14:51:35.827218 1 server.cc:522]

when I change backend as "tensorflow" and platform as "tensorflow_graphdef", it does load the models but while running client code for inference, following error I am getting.

2022-08-26 14:59:01.513135: E tensorflow/core/common_runtime/executor.cc:645] Executor failed to create kernel. Invalid argument: The TF function for the TRT segment could not be empty
         [[{{node TRTEngineOp_0}}]]

Model Trained : tf 1.14 Optimized frozen graph using : nvcr.io/nvidia/tensorrt:21.10-py3 docker image

Triton Information What version of Triton are you using? nvcr.io/nvidia/tritonserver:21.10-py3 docker image

Are you using the Triton container or did you build it yourself? Triton container

To Reproduce Steps to reproduce the behavior.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

model : ssd_inception_v2

Expected behavior A clear and concise description of what you expected to happen.

Please let me know If any additional information is needed. Thank you

Aug 26 '22 15:08 purvang3

Hi @purvang3, does the same model work (using the exact same model configuration file) if you just use the TF model (without TF-TRT optimization)? Besides, there is a specific version of TF that is compatible with 21.10. The version mismatch might cause this issue. You can find what is included in each version of the container here: https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html

Sep 07 '22 17:09 krishung5

Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this.

Oct 11 '22 00:10 krishung5

server server copied to clipboard

Error when doing inference using tf-trt converted frozen model

server
server copied to clipboard