server
server copied to clipboard
Error when doing inference using tf-trt converted frozen model
Description A clear and concise description of what the bug is.
Object detection model trained using tensorflow 1.14. I am able to deploy frozen graph to triton server and perform inference without any issue using tensorflow backend.
After that, Optimized frozen model using tensorrt:21.10-py3 docker image. original frozen graph was 63 MB and converted optimized tf-trt graph is 129 MB.
while deploying converted optimized graph with triton inference server, if I use backend as "tensorrt" and platform as "tensorrt_plan" in config file, it doesn't load model at all.
E0826 14:51:35.815489 1 logging.cc:43] 1: [stdArchiveReader.cpp::StdArchiveReader::29] Error Code 1: Serialization (Serialization assertion magicTagRead == magicT
ag failed.Magic tag does not match)
E0826 14:51:35.815519 1 logging.cc:43] 4: [runtime.cpp::deserializeCudaEngine::75] Error Code 4: Internal Error (Engine deserialization failed.)
I0826 14:51:35.826583 1 tensorrt.cc:5123] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0826 14:51:35.826609 1 tensorrt.cc:5062] TRITONBACKEND_ModelFinalize: delete model state
E0826 14:51:35.827106 1 model_repository_manager.cc:1186] failed to load 'my_model' version 1: Internal: unable to create TensorRT engine
I0826 14:51:35.827218 1 server.cc:522]
when I change backend as "tensorflow" and platform as "tensorflow_graphdef", it does load the models but while running client code for inference, following error I am getting.
2022-08-26 14:59:01.513135: E tensorflow/core/common_runtime/executor.cc:645] Executor failed to create kernel. Invalid argument: The TF function for the TRT segment could not be empty
[[{{node TRTEngineOp_0}}]]
Model Trained : tf 1.14 Optimized frozen graph using : nvcr.io/nvidia/tensorrt:21.10-py3 docker image
Triton Information What version of Triton are you using? nvcr.io/nvidia/tritonserver:21.10-py3 docker image
Are you using the Triton container or did you build it yourself? Triton container
To Reproduce Steps to reproduce the behavior.
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
model : ssd_inception_v2
Expected behavior A clear and concise description of what you expected to happen.
Please let me know If any additional information is needed. Thank you
Hi @purvang3, does the same model work (using the exact same model configuration file) if you just use the TF model (without TF-TRT optimization)? Besides, there is a specific version of TF that is compatible with 21.10. The version mismatch might cause this issue. You can find what is included in each version of the container here: https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html
Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this.