tensorrt
tensorrt copied to clipboard
Shape output bug in dynamic shape mode
If we enable dynamic shape mode, and any of the TRT segments has an output from a shape op, then conversion crashes the application.
This problem occurs in practice for MobileNet and U-Net, and could happen in other cases es well (it is a frequent pattern that Shape is followed by DataFormatVecPermute which is TRT incompatible --> network ends with shape output).
Here is a reproducer:
#!/usr/bin/env python
# coding: utf-8
import os
os.environ["TF_CPP_VMODULE"]="trt_logger=2,trt_engine_utils=2,trt_engine_op=2,convert_nodes=2,convert_graph=2,segment=2,trt_shape_optimization_profiles=2,trt_engine_resource_ops=2"
# os.environ["TF_TRT_OP_DENYLIST"] = "Shape" # Uncomment this line to avoid the error
import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
from tensorflow.python.ops import gen_array_ops, array_ops
@tf.function
def my_func(x):
q = x + 1
q_shape= array_ops.shape(q)
return array_ops.identity(q_shape, name="output_0")
cfunc = my_func.get_concrete_function(tf.TensorSpec([None, None], tf.float32))
module = tf.Module()
module.myfunc = my_func
tf.saved_model.save(module, '/tmp/models/my_function', signatures=cfunc)
conv_params = trt.TrtConversionParams(minimum_segment_size=2)
converter = trt.TrtGraphConverterV2(
input_saved_model_dir="/tmp/models/my_function",
conversion_params=conv_params, use_dynamic_shape=True)
converter.convert()
def input_fn():
x = np.arange(15).astype(np.float32).reshape(5, 3)
yield (x,)
converter.build(input_fn)
converter.save("/tmp/models/trt_model")
print("Model exported to TRT")
output
...trt_engine_op.cc:755] Native segment is used during collecting shapes for profiles
...trt_engine_op.cc:329] Constructing function handle
...trt_engine_op.cc:532] Executing native segment: TRTEngineOp_0_0
...trt_engine_op.cc:542] Native Segment completed
...E tensorflow/stream_executor/cuda/cuda_driver.cc:1182] failed to enqueue async memcpy from device to host: CUDA_ERROR_INVALID_VALUE: invalid argument; host dst: 0x7faf6b000000; GPU src: 0x7fb378010740; siz
e: 8=0x8
...F tensorflow/core/common_runtime/gpu/gpu_util.cc:291] GPU->CPU Memcpy failed
Aborted (core dumped)
My hypothesis is that the Shape op is expected to produce output in CPU memory, and this is not respected by the native segment that we create.
A simple workaround is to disable conversion of the Shape op: os.environ["TF_TRT_OP_DENYLIST"] = "Shape"
. A better solution would be to fix native segment execution.
The problem is that the GPU ShapeOp has its output in host memory, see here
Now can we represent such an TRTEngineOp, when executing on TRT the output is in GPU but when execute in native segment the output is in CPU?
Other possible solutions: to insert another op to alway makes the output in GPU, or to somehow force the native segment to execute on CPU.
The problem still exist. Here is a modified reproducer, which adds a TRT incompatible op (DataFormatVecPermute) followed by other TRT compatible ops:
#!/usr/bin/env python
# coding: utf-8
import os
os.environ["TF_CPP_VMODULE"]="trt_logger=2,trt_engine_utils=2,trt_engine_op=2,convert_nodes=2,convert_graph=2,segment=2,trt_shape_optimization_profiles=2,trt_engine_resource_ops=2"
#os.environ["TF_TRT_OP_DENYLIST"] = "Shape" # Uncomment this line to avoid the error
import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt
import numpy as np
from tensorflow.python.ops import gen_array_ops, array_ops
@tf.function
def my_func(x):
q = x + 1
q = array_ops.shape(q)
q = tf.raw_ops.DataFormatVecPermute(
x=q, src_format='NHWC', dst_format='NCHW')
q = q * 2 + q * q
return array_ops.identity(q, name="output_0")
cfunc = my_func.get_concrete_function(tf.TensorSpec([None, None, None, None], tf.float32))
module = tf.Module()
module.myfunc = my_func
tf.saved_model.save(module, '/tmp/models/my_function', signatures=cfunc)
conv_params = trt.TrtConversionParams(minimum_segment_size=1)
converter = trt.TrtGraphConverterV2(
input_saved_model_dir="/tmp/models/my_function",
conversion_params=conv_params, use_dynamic_shape=True)
converter.convert()
def input_fn():
x = np.arange(15).astype(np.float32).reshape(1, 1, 5, 3)
yield (x,)
converter.build(input_fn)
converter.save("/tmp/models/trt_model")
print("Model exported to TRT")
This results in a segfault during/after native segment execution:
2021-06-16 13:03:22.937365: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:538] Executing native segment: TRTEngineOp_0_1
Thread 292 "python" received signal SIGSEGV, Segmentation fault.