tensorrt
tensorrt copied to clipboard
No SpeedUp after TensorRT INT8
Description
I transfer my model to tensorrt engine using tftrt in IN8. However, the speed is same as FP32 even FP16. I also change minimum_segment_size to 2, 3, 5, but it also does not help. The speed always is the same no matter what the minimum_segment_size is or precision mode is.
I use the code to check the number of node that's being optimized or replaced with TRT nodes:
print("graph_size(MB)(native_tf): %.1f" % (float(graph_size) / (1 << 20))) print("graph_size(MB)(trt): %.1f" % (float(len(engine_graph.SerializeToString())) / (1 << 20))) print("num_nodes(native_tf): %d" % num_nodes) print("num_nodes(tftrt_total): %d" % len(engine_graph.node)) print("num_nodes(trt_only): %d" % len([1 for n in engine_graph.node if str(n.op) == 'TRTEngineOp']))
The log is shown: graph_size(MB)(native_tf): 52.6 graph_size(MB)(trt): 52.8 num_nodes(native_tf): 4243 num_nodes(tftrt_total): 3429 num_nodes(trt_only): 76
The log in convert() function is shown as following: 2020-02-18 04:06:40.806897: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:633] Number of TensorRT candidate segments: 76 2020-02-18 04:06:41.444418: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 10 nodes succeeded. 2020-02-18 04:06:41.444539: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_1 added for segment 1 consisting of 19 nodes succeeded. 2020-02-18 04:06:41.444757: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_2 added for segment 2 consisting of 17 nodes succeeded. 2020-02-18 04:06:41.444921: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_3 added for segment 3 consisting of 17 nodes succeeded. ..... 2020-02-18 04:06:42.007572: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: TRTEngineOp_24_native_segment 2020-02-18 04:06:42.007581: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788] constant_folding: Graph size after: 20 nodes (0), 19 edges (0), time = 0.919ms. 2020-02-18 04:06:42.007587: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788] layout: Graph size after: 20 nodes (0), 19 edges (0), time = 0.652ms. 2020-02-18 04:06:42.007593: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788] constant_folding: Graph size after: 20 nodes (0), 19 edges (0), time = 0.765ms. 2020-02-18 04:06:42.007599: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788] TensorRTOptimizer: Graph size after: 20 nodes (0), 19 edges (0), time = 0.095ms. 2020-02-18 04:06:42.007605: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788] constant_folding: Graph size after: 20 nodes (0), 19 edges (0), time = 0.828ms.
The log of calibrate() function is shown as following:
2020-02-18 04:06:43.967599: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e44009490 2020-02-18 04:06:43.967669: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6 2020-02-18 04:06:43.968091: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6 2020-02-18 04:06:58.017542: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2020-02-18 04:06:58.058014: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e3c008780 2020-02-18 04:06:58.079357: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e74007ed0 2020-02-18 04:06:58.152381: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e6c01f3a0 2020-02-18 04:06:58.188232: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2020-02-18 04:06:58.188977: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e5006f210 2020-02-18 04:06:58.258622: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e6c0279a0 2020-02-18 04:06:58.301596: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e6c027860 2020-02-18 04:06:58.382948: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e6c03b280 2020-02-18 04:06:58.432091: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e6c03fea0 2020-02-18 04:06:58.467576: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e5007d720 .... 2020-02-18 04:07:12.796185: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e3c0b45c0 2020-02-18 04:07:12.836554: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e3c0bbc70 2020-02-18 04:07:13.362552: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e3c0c3520 2020-02-18 04:07:14.132879: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e3c0c48f0 2020-02-18 04:07:14.132989: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e4c001180 2020-02-18 04:07:14.331160: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e3c0e3660
The log as infer as shown:
2020-02-18 04:10:37.066778: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for TRTEngineOp_0 input shapes: [[1,8192,3]] 2020-02-18 04:10:37.066912: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6 2020-02-18 04:10:37.067393: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6 2020-02-18 04:10:55.899338: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for fp_2/TRTEngineOp_14 input shapes: [[1,256,3]] 2020-02-18 04:10:55.916907: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for fp_1/TRTEngineOp_10 input shapes: [[1,1024,3]] 2020-02-18 04:10:55.988548: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for fp_0/TRTEngineOp_6 input shapes: [[1,8192,3]] 2020-02-18 04:10:56.048204: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2020-02-18 04:10:56.049951: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for TRTEngineOp_57 input shapes: [[1,64,8192,4]] 2020-02-18 04:10:56.470749: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for TRTEngineOp_58 input shapes: [[1,64,8192,2]] .... ....
Environment
TensorRT Version: 6 GPU Type: GTX 1660Ti Nvidia Driver Version: 440.59 CUDA Version: 10.1 CUDNN Version: 7.6.3 Operating System + Version: ubuntu 16.04 Python Version (if applicable): 3.5 TensorFlow Version (if applicable): 1.15.0