tensorrt Running multiple TensorRT-optimized models in Tensorflow

I am working with the Tensorflow 2.0 project that uses multiple models for inference. Some of those models were optimized using TF-TRT.

I tried both regular offline conversion and offline conversion with engine serialization. In case of regular conversion TensorRT engine is rebuilt every time model execution context changes. While using models with serialized engines, I’m not able to load more than one TensorRT-optimized models.

My application uses single Session at runtime.

I am using nvcr.io/nvidia/tensorflow:19.12-tf2-py3 docker container to optimize models and run the application.

More about the issue in: https://stackoverflow.com/questions/60967867/running-multiple-tensorrt-optimized-models-in-tensorflow

What is the correct approach to run simultaneously multiple TensorRT-optimized models with pre-built engines using Tensorflow?

Is it a valid solution to use a separate Session for each of those models

Apr 07 '20 08:04 andriiyurkiv

Thanks for the detailed report. It is a valid use case to have multiple models with multiple pre-built engines. We seem to have a problem with the way the engines cached, we are working on this problem. This is related to Issue #195, we will continue the discussion there.

Apr 08 '20 16:04 tfeher

@tfeher I am also having a problem running two tensorRT optimized models. The inference is completed for the first network, but then in the second network, the errors I included below occur. Is this a similar issue or something completely different? I am using tf 2.1.0 and both models run properly when they are separated, however when I load both models in the same program and run inference sequentially the second model always fails with the cache size error.

2020-06-23 12:22:53.617659: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at trt_engine_op.cc:494 : Invalid argument: Input shape list size mismatch for PartitionedCall/TRTEngineOp_5, cached size: 6 vs. actual size: 1 2020-06-23 12:22:53.654311: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Invalid argument: Input shape list size mismatch for PartitionedCall/TRTEngineOp_5, cached size: 6 vs. actual size: 1 [[{{node PartitionedCall/TRTEngineOp_5}}]] Traceback (most recent call last): File "live_inf.py", line 108, in image_s,results=compute_inference_seg(infer_seg,image_np) File "live_inf.py", line 69, in compute_inference_seg results=infer(input_tensor)['output'][0] File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1551, in call return self._call_impl(args, kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1591, in _call_impl return self._call_flat(args, self.captured_inputs, cancellation_manager) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/saved_model/load.py", line 99, in _call_flat cancellation_manager) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 545, in call ctx=ctx) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute six.raise_from(core._status_to_exception(e.code, message), None) File "", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: Input shape list size mismatch for PartitionedCall/TRTEngineOp_5, cached size: 6 vs. actual size: 1 [[{{node PartitionedCall/TRTEngineOp_5}}]] [Op:__inference_signature_wrapper_29995]

Function call stack: signature_wrapper

Jun 23 '20 19:06 anoushsepehri

@anoushsepehri I am facing the same issue using multiple networks converted by tensorrt. Have you found any workaround?

Nov 07 '22 14:11 jensk1