tensorrt
tensorrt copied to clipboard
Native TensorFlow is >6 times faster than TensorRT
Description
My neural network runs much faster with native TensorFlow compared to a TensorRT optimized model:
Images per second with native TF: 4.785973 Images per second with TRT: 0.712366
I get these numbers on a TITAN X, but I can observe the same effect on a Titan V.
Environment
TensorRT Version: 7.0.0 GPU Type: TITAN X (Pascal) Nvidia Driver Version: 440.33.01 CUDA Version: 10.2 CUDNN Version: 7.6.5.32 Operating System + Version: Ubuntu 18.04 Python Version (if applicable): 3.6.9 TensorFlow Version (if applicable): nightly/master (8869157e7f) PyTorch Version (if applicable): - Baremetal or Container (if container which image + tag): Baremetal
Relevant Files
https://ft.fzi.de/d=e4fa141418484b9e94039230cd7560de
Steps To Reproduce
Download the model and the test script from the link above, extract the archive and run test.py
. Make sure that you have the libraries mentioned in the environment section.
I observed something similar on (1) Nvidia Jetson Nano and (2) Nvidia RTX 2080 Ti with ssd_mobilenet_v2_coco_2018_03_29
model. Native Tensorflow is about 3x and 10x faster than TensorRT model on Nvidia Jetson Nano and Nvidia RTX 2080 Ti, respectively.
Steps to Reproduce
-
Download
ssd_mobilenet_v2_coco_2018_03_29
model
$ wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz
$ tar xf ssd_mobilenet_v2_coco_2018_03_29.tar.gz
- Use TensorRT to optimize the model
$ git clone https://github.com/tensorflow/tensorrt.git
$ cd tensorrt/tftrt/examples/object_detection
$ git submodule update --init
$ ./install_dependencies.sh
MODEL="ssd_mobilenet_v2_coco_2018_03_29"
$ python object_detection.py --input_saved_model_dir /coco/$MODEL/saved_model --output_saved_model_dir /coco/$MODEL/tftrt_model --data_dir /coco/val2017 --annotation_path /coco/annotations/instances_val2017.json --input_size 640 --batch_size 1 --use_trt --precision FP16 --gpu_mem_cap 8192
- Clone my fork of TensorRT to get the inference script
$ git clone https://github.com/dloghin/tensorrt.git tensorrt-fork
-
Clone Tensorflow
models
repo, copy my script and run
$ git clone https://github.com/tensorflow/models.git
$ cd models/research && protoc object_detection/protos/*.proto --python_out=.
$ export PYTHONPATH=`pwd`
$ cd object_detection
$ cp ~/tensorrt-fork/tftrt/examples/object_detection/inference_object_detection.py .
$ cp ~/tensorrt-fork/tftrt/examples/object_detection/orange-apple-banana.jpg .
$ python inference_object_detection.py /coco/ssd_mobilenet_v2_coco_2018_03_29/saved_model data/mscoco_label_map.pbtxt orange-apple-banana.jpg
$ python inference_object_detection.py /coco/ssd_mobilenet_v2_coco_2018_03_29/tftrt_model data/mscoco_label_map.pbtxt orange-apple-banana.jpg
For the first run (native Tensorflow) on RTX 2080 Ti, you should get:
...
Inference time: 4.3128767013549805 s
...
For the second run (optimized with TensorRT) on RTX 2080 Ti, you should get:
...
Inference time: 47.05435824394226 s
...
Environment
(for RTX 2080 Ti, I am running this in a Docker container based on Nvidia's nvcr.io/nvidia/tensorrt:19.10-py3
image)
GPU: Nvidia RTX 2080 Ti Host OS: Ubuntu 18.04.4 LTS Docker Version: 19.03.8 Nvidia Driver: 440.100 Docker Base Image: nvcr.io/nvidia/tensorrt:19.10-py3 Cuda Version: 10.1 Python Version: 3.6.8 TensortFlow Version: 2.3.0 TensorRT Version: 6.0.1
Hi, I am having a similar issue on GPU: Nvidia RTX 2080 Ti
and Jetson Xavier NX
. Any idea to fix, please? Thanks