tensorrt
tensorrt copied to clipboard
TensorRT rebuilds ops at run time
At run time, I call
converter.convert() converter.build(input_fn = some_input_fn) converter.save()
`
And see logging to the effect of
tensorflow/core/grappler/optimizers/meta_optimizer.cc:814] Optimization results for grappler item: StatefulPartitionedCall/model_1/TRTEngineOp_1_native_segment
2020-03-18 01:30:00.932297: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816] constant_folding: Graph size after: 33 nodes (0), 32 edges (0), time = 1.199ms.
2020-03-18 01:30:00.932302: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816] layout: Graph size after: 33 nodes (0), 32 edges (0), time = 1.307ms.
2020-03-18 01:30:00.932307: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816] constant_folding: Graph size after: 33 nodes (0), 32 edges (0), time = 1.228ms.
...
2020-03-18 01:30:37.946847: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for StatefulPartitionedCall/model_1/TRTEngineOp_1 with input shapes: [[25,64,4,4], [25,64,4,4]]
However, at run time, after loading the above saved model, and after the model is run a couple times, I then see that the model tries to build a new engine.
2020-03-18 01:31:58.716377: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:736] Building a new TensorRT engine for PartitionedCall/StatefulPartitionedCall/model_1/TRTEngineOp_1 with input shapes: [[25,64,4,4], [25,64,4,4]]
In particular, the PartitionedCall nests the previous engine name. Any thoughts on why/how to debug why a new engine would be made despite the build call. I have verified the inputs sizes are the same and you can see that in the logs.
EDIT: It appears this is a semi-known issue that is meant to be handled by name canonicalization here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc#L869 , but that logic is not being correctly applied here as PartitionedCall/StatefulPartitionedCall/model_1//TRTEngineOp_1 and StatefulPartitionedCall/model_1/TRTEngineOp_1 are not matched to the same engine despite having the same postfix.
I am having the same issue. I need a low latency inference for a stateful recurrent network and the engine rebuilding adds a lot of latency to it. Also after rebuilding my throughput drops after rebuilding. My application is running with ~40FPS and after the engine rebuilding it drops to ~30FPS.
Tried to update to tf-2.2-rc1, it still has the same issue.
@nrothGIT did you find any workaround using the information you posted?
CC @bixia1
@cbodenst, Sadly I haven't had the time to dig a ton further on this, so have no good fix at the moment. I'll post back if I make any progress.
Thanks for the report. It should not try to rebuild the engines again, but it is difficult to say anything more at this stage. It might help if you can post the full logs with level 2:
TF_CPP_VMODULE=trt_engine_op=2,convert_nodes=2,convert_graph=2,segment=2,trt_shape_optimization_profiles=2,trt_engine_resource_ops=2 python my_script.py
It would be better if you can share a reproducer, so that we can have a closer look.
@tfeher I will take a look into a reproducer, but I am not sure how build it since tensorrt optimizations are strongly hardware dependent.
I attached a log file. You can see there is a rebuilding at the beginning of the script. After another 2 Minutes it is building the engine again. You can just grep for "Building TensorRT engine"
I also figured out why the model is slower after rebuilding. tf-2.2.0rc2 fails to build a node and is using the native one. With tf-2.1.0 the whole model got properly converted. In the test you see in the logs I converted the model directly with tf-2.2.0rc2.
My specs: Cuda: 10.1 CUDNN: 7.6 TensorRT: 6.5 tensorflow: 2.2.0rc2
Hello, I'm facing the similar issue while trying to run multiple TensorRT-optimized models in TF2.
import tensorflow as tf
from tensorflow.python.saved_model import tag_constants
TENSORRT_WEIGHTS_ROOT = '/path/to/weights/root/'
class TensorRTModel:
def __init__(self, model_name, weights_root=TENSORRT_WEIGHTS_ROOT):
self.weights_root = weights_root
self.model_name = model_name
self.model, self.infer = self.__load_tensor_rt_model(model)
def __load_tensor_rt_model(self, model):
saved_model_loaded = tf.saved_model.load(os.path.join(
self.weights_root,
self.model_name,
f'{self.model_name}_Conversions.TRT_FP32_MODEL'),
tags=[tag_constants.SERVING])
infer = saved_model_loaded.signatures['serving_default']
return saved_model_loaded, infer
class Model1(TensorRTModel):
# some model specific code
class Model2(TensorRTModel):
# some model specific code
if __name__ == '__main__':
model1 = Model1('model1')
model2 = Model2('model2')
for i in range(1000):
print(i)
model1_input = tf.constant(np.random.normal(size=(8, 640, 640, 3)).astype(np.float32))
result = model1.infer(model1_input)
model2_input = tf.constant(np.random.normal(size=(8, 224, 224, 3)).astype(np.float32))
result = model2.infer(model2_input)
Model1 TensorRT optimizes to 2 ops:
PartitionedCall/TRTEngineOp_1 input shapes: [[8,640,640,3]]
PartitionedCall/TRTEngineOp_0 input shapes: [[8,1536,18,18]]
Model2 TensorRT optimizes to 1 op:
PartitionedCall/TRTEngineOp_0 input shapes: [[8,224,224,3]]
At runtime engine for PartitionedCall/TRTEngineOp_0 gets rebuilt at each iteration, while for PartitionedCall/TRTEngineOp_1 engine is built only once.
Logs attached.
Environment:
nvcr.io/nvidia/tensorflow:19.12-tf2-py3 docker container
Is it possible to manually set Op name/prefix so those two Ops would not be considered the same entity?
Is it possible to manually set Op name/prefix so those two Ops would not be considered the same entity?
We merged a change for this on April 3. Can you update your tree to check that again?
We merged a change for this on April 3. Can you update your tree to check that again?
Can you share a link to the corresponding PR?
https://github.com/tensorflow/tensorflow/blob/c3e12bcd6d01f3bc92cbb99f3c91fc0bec3a8562/tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc#L674-L675
https://github.com/tensorflow/tensorflow/commit/21de77485f9230eff364a4f0c8e0a55857965223
@bixia1 @tfeher I've tried to run the code snipped I shared with you using the latest version of the TF master branch. As you can see from the logs, now it does not output any TensorRT engine building - related information. But at the same time, the timeout remains the same as if the engines were rebuilt at each iteration.
Those two models were optimized using the new version of TF as well.
I tested again with tensorflow 2.2.0rc3 and it is still rebuilding. The program runs fine for 10 Minutes but then it suddenly rebuilds the engine.
https://github.com/tensorflow/tensorflow/blob/c3e12bcd6d01f3bc92cbb99f3c91fc0bec3a8562/tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc#L674-L675
I also tried to build the current master (f98da6aa76ab991fb06f2de24ee7a8b87348d66e). It is still rebuilding the engines.
Are there any activities about that issue?
I will look into this, it is the next thing on my todo list.
andriiyurkiv Can you post a simplified version of your test case?
Can one of you post a reproducer for this?
@bixia1 I will work on that
Please try this code. Im not sure if this is hardware dependent but it builds the engine twice. Once at converter.build and once when the itteration starts. But I did not get it to rebuild again after multiple predictions, as it is the case in my production code.
from tensorflow.python.compiler.tensorrt import trt_convert as trt
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Flatten, Conv2D, Dense, Input
from tensorflow.keras.models import load_model
import numpy as np
import tensorflow as tf
data = tf.constant(np.empty([1,200, 200,3],np.float32))
def gen():
yield [data]
def create_model():
i = Input((200,200,3), batch_size=1)
x = Conv2D(16,(3,3))(i)
x = Flatten()(x)
x = Dense(10,activation="relu")(x)
x = Dense(10,activation="relu")(x)
m = Model(i, x)
m.compile("sgd", "mse")
model_path = "my_model"
m.save(model_path,save_format=tf)
print("model created")
return model_path
def convert_to_trt(input_path: str, output_path: str, data_gen=None, precision="FP32"):
conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS
conversion_params = conversion_params._replace(precision_mode=precision)
converter = trt.TrtGraphConverterV2(
input_saved_model_dir=input_path, conversion_params=conversion_params
)
converter.convert()
print("build engine ...")
if data_gen:
converter.build(data_gen)
converter.save(output_path)
print("model converted")
if __name__ == "__main__":
input_path = create_model()
output_path = "trt_model"
convert_to_trt(input_path, output_path, gen)
model = load_model(output_path).signatures["serving_default"]
for i in range(100000):
print("round:",i)
model(data)
Here a part of the output:
build engine ...
2020-06-04 16:16:39.485947: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:982] Building a new TensorRT engine for TRTEngineOp_0 with input shapes: [[1,200,200,3]]
2020-06-04 16:16:39.486004: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:1153] Linked TensorRT version: 6.0.1
2020-06-04 16:16:39.486091: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-06-04 16:16:39.486109: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:1154] Loaded TensorRT version: 6.0.1
2020-06-04 16:16:39.487998: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
model converted
round: 0
2020-06-04 16:16:43.393328: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:982] Building a new TensorRT engine for PartitionedCall/TRTEngineOp_0 with input shapes: [[1,200,200,3]]
I don't see the problem using the test case provided in comment #11.
@bixia1 The problem is that the TRT Engine got build twice because of the different name prefix. The TRT-Engine-Rebuild could also happen again after a lot of predictions, but this is not simple to reproduce. In general, this prevents high throughput use cases.
Also, please take a look into the log files I posted some time ago,
@tfeher I will take a look into a reproducer, but I am not sure how build it since tensorrt optimizations are strongly hardware dependent.
I attached a log file. You can see there is a rebuilding at the beginning of the script. After another 2 Minutes it is building the engine again. You can just grep for "Building TensorRT engine"
I also figured out why the model is slower after rebuilding. tf-2.2.0rc2 fails to build a node and is using the native one. With tf-2.1.0 the whole model got properly converted. In the test you see in the logs I converted the model directly with tf-2.2.0rc2.
My specs: Cuda: 10.1 CUDNN: 7.6 TensorRT: 6.5 tensorflow: 2.2.0rc2
@cbodenst Can you point out which one is a rebuild? I use command "grep -e "Building a" -e "Finished" log" to process your log and don't see which one is a rebuilt. See the file I attach here, they seems to have different shapes. processed.txt
@cbodenst Can you point out which one is a rebuild? I use command "grep -e "Building a" -e "Finished" log" to process your log and don't see which one is a rebuilt. See the file I attach here, they seems to have different shapes. processed.txt
in your processed.txt you can see the rebuilt at line 4, timestamp "2020-04-08 11:40:48.633176". 12 minutes after the processing loop has started
I see that the engine is not created successfully: 2020-04-08 11:28:58.582814: W tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:1002] Engine creation for StatefulPartitionedCall/TRTEngineOp_2 failed. The native segment will be used instead. Reason: Unimplemented: Transpose too large:10437228
@cbodenst can you try setting this env variable to see if that help? TF_DEBUG_TRT_ALLOW_INEFFICIENT_TRANSPOSE=1
I already tried that. This helps to build the TRT-engine properly but still results in rebuilding after some time.
You meant, even though the engine is built successfully, it will be rebuilt after a while? Can you provide a log to show that?
I made another run with TF_DEBUG_TRT_ALLOW_INEFFICIENT_TRANSPOSE=1. You can see that the engine gets rebuild at timestamp: 2020-06-22 15:24:16.430878. Also the rebuild fails sometimes as you can see in the logs. log.zip
@bixia1 are you still on it? I actually really liked to use tf-trt, but if this is not working I need to invest time into alternatives. Didn't @nrothGIT described at his very first post what the issue is? Or is this unrelated?
@cbodenst Yes, I will look at this again soon.