tensorflow-yolov4-tflite
tensorflow-yolov4-tflite copied to clipboard
Slow performance with TensorRT on Jetson Xavier NX
I converted the yolo v4 model to tensorRT and then ran the demo on the video and got very poor performance, see output below.
Running with yolo v4 tiny I got ~20 - 23 fps.
Any thoughts what could be causing this?
python3 detect_video.py --weights ./checkpoints/yolov4-trt-fp16-416 --model yolov4 --video ./data/video/video.mp4 --framework trt
2020-08-21 15:58:45.759953: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-21 15:58:53.013271: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-08-21 15:58:53.025845: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-21 15:58:53.026066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.109GHz coreCount: 6 deviceMemorySize: 7.58GiB deviceMemoryBandwidth: 66.10GiB/s
2020-08-21 15:58:53.026269: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-21 15:58:53.026493: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-21 15:58:53.026635: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-21 15:58:53.029433: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-21 15:58:53.038766: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-21 15:58:53.045534: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-21 15:58:53.045842: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-21 15:58:53.046590: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-21 15:58:53.047392: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-21 15:58:53.047524: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-08-21 15:58:53.090421: W tensorflow/core/platform/profile_utils/cpu_utils.cc:106] Failed to find bogomips or clock in /proc/cpuinfo; cannot determine CPU frequency
2020-08-21 15:58:53.092386: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5f71c80 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-21 15:58:53.092492: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-08-21 15:58:53.193881: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-21 15:58:53.194361: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5f355c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-21 15:58:53.194804: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Xavier, Compute Capability 7.2
2020-08-21 15:58:53.196612: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-21 15:58:53.196837: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.109GHz coreCount: 6 deviceMemorySize: 7.58GiB deviceMemoryBandwidth: 66.10GiB/s
2020-08-21 15:58:53.197067: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-21 15:58:53.197156: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-21 15:58:53.197232: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-21 15:58:53.197437: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-21 15:58:53.197581: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-21 15:58:53.197742: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-21 15:58:53.197818: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-21 15:58:53.198251: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-21 15:58:53.198729: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-21 15:58:53.198860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-08-21 15:58:53.199457: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-21 15:58:55.500061: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-21 15:58:55.500206: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-08-21 15:58:55.500252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-08-21 15:58:55.501785: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-21 15:58:55.502399: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-21 15:58:55.502667: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2692 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2020-08-21 15:58:55.670082: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-21 15:58:55.670309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:00:00.0 name: Xavier computeCapability: 7.2
coreClock: 1.109GHz coreCount: 6 deviceMemorySize: 7.58GiB deviceMemoryBandwidth: 66.10GiB/s
2020-08-21 15:58:55.670586: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.2
2020-08-21 15:58:55.670718: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-21 15:58:55.670838: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-21 15:58:55.671045: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-21 15:58:55.671203: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-21 15:58:55.671335: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-21 15:58:55.671407: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-08-21 15:58:55.671860: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-21 15:58:55.672347: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-21 15:58:55.672443: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-08-21 15:58:55.672527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-21 15:58:55.672563: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-08-21 15:58:55.672602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-08-21 15:58:55.673101: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-21 15:58:55.673666: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:948] ARM64 does not support NUMA - returning NUMA node zero
2020-08-21 15:58:55.673793: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2692 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2020-08-21 16:01:16.389396: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:1204] Linked TensorRT version: 7.1.3
2020-08-21 16:01:16.705409: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer.so.7
2020-08-21 16:01:16.715170: I tensorflow/compiler/tf2tensorrt/convert/convert_nodes.cc:1205] Loaded TensorRT version: 7.1.3
2020-08-21 16:01:16.736411: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libnvinfer_plugin.so.7
FPS: 0.00
FPS: 1.38
FPS: 1.78
FPS: 1.59
FPS: 1.71
FPS: 1.97
FPS: 1.86
FPS: 1.81
FPS: 1.70
FPS: 1.83
FPS: 1.78
FPS: 1.78
FPS: 16.84
FPS: 2.05
FPS: 16.57
FPS: 1.87
FPS: 1.80
FPS: 1.88
FPS: 1.85
FPS: 16.94
FPS: 1.95
FPS: 2.03
FPS: 1.85
FPS: 1.78
FPS: 2.06
FPS: 2.04
FPS: 1.85
FPS: 1.81
FPS: 1.91
FPS: 2.16
FPS: 1.90
FPS: 1.89
FPS: 1.96
FPS: 1.91
FPS: 1.94
FPS: 1.83
FPS: 1.81
FPS: 1.81
FPS: 2.02
FPS: 1.92
FPS: 1.85
FPS: 18.64
FPS: 1.87
FPS: 1.97
FPS: 2.02
FPS: 2.06
FPS: 2.13
FPS: 1.98
FPS: 1.89
FPS: 1.88
FPS: 1.93
FPS: 2.02
FPS: 2.05
FPS: 1.92
FPS: 1.87
FPS: 15.87
FPS: 1.94
FPS: 2.13
FPS: 1.92
FPS: 1.89
FPS: 1.92
FPS: 2.15
FPS: 1.99
FPS: 1.99
FPS: 2.01
FPS: 1.99
FPS: 1.93
FPS: 1.84
FPS: 2.15
FPS: 2.00
FPS: 1.97
FPS: 2.04
FPS: 2.11
FPS: 1.96
FPS: 1.94
FPS: 1.90
FPS: 2.13
FPS: 2.01
FPS: 1.79
FPS: 1.93
FPS: 1.93
FPS: 1.91
FPS: 1.80
FPS: 1.84
FPS: 1.96
FPS: 1.98
FPS: 1.86
FPS: 1.94
FPS: 18.01
FPS: 18.29
FPS: 1.96
FPS: 2.19
FPS: 2.00
FPS: 1.96
FPS: 1.96
FPS: 2.08
FPS: 17.03
FPS: 16.81
FPS: 2.03
FPS: 2.04
FPS: 1.84
FPS: 2.02
FPS: 2.13
FPS: 1.97
FPS: 1.87
FPS: 2.06
FPS: 2.12
FPS: 2.07
FPS: 1.92
FPS: 1.93
FPS: 1.93
FPS: 1.83
FPS: 18.25
FPS: 1.90
FPS: 2.15
FPS: 2.01
FPS: 1.89
FPS: 1.91
FPS: 2.02
FPS: 2.09
FPS: 1.97
FPS: 2.08
FPS: 2.31
FPS: 1.92
FPS: 1.85
FPS: 16.55
FPS: 1.90
FPS: 2.10
FPS: 2.01
FPS: 1.92
FPS: 1.98
FPS: 2.15
FPS: 1.86
FPS: 1.96
FPS: 1.79
FPS: 16.92
FPS: 2.08
FPS: 1.91
FPS: 1.86
FPS: 1.81
FPS: 1.87
FPS: 1.97
FPS: 2.07
FPS: 1.89
FPS: 2.10
FPS: 15.99
FPS: 17.36
FPS: 1.98
FPS: 2.16
FPS: 1.99
FPS: 1.88
FPS: 1.97
FPS: 1.93
FPS: 1.99
FPS: 1.93
FPS: 2.01
FPS: 2.04
FPS: 1.85
FPS: 1.84
FPS: 2.08
FPS: 1.79
FPS: 1.93
FPS: 1.91
FPS: 1.88
FPS: 2.05
FPS: 1.98
FPS: 1.92
FPS: 1.98
FPS: 2.05
FPS: 1.89
FPS: 1.90
FPS: 1.97
FPS: 2.03
FPS: 1.94
FPS: 1.93
FPS: 2.01
FPS: 2.04
FPS: 1.95
FPS: 1.88
FPS: 17.99
FPS: 1.87
FPS: 2.01
FPS: 1.98
FPS: 1.85
FPS: 1.93
FPS: 1.98
FPS: 2.18
FPS: 1.89
FPS: 2.09
FPS: 2.09
FPS: 15.49
FPS: 1.99
FPS: 1.90
FPS: 17.84
FPS: 1.84
FPS: 1.86
FPS: 2.19
FPS: 16.82
FPS: 1.85
FPS: 1.78
FPS: 1.84
FPS: 1.90
FPS: 16.37
FPS: 2.03
FPS: 1.78
FPS: 1.91
FPS: 1.82
FPS: 2.12
FPS: 1.97
FPS: 1.85
FPS: 18.49
FPS: 1.86
FPS: 1.79
FPS: 1.95
FPS: 2.01
FPS: 1.95
FPS: 1.85
FPS: 2.15
FPS: 1.97
FPS: 1.96
FPS: 2.02
FPS: 2.13
FPS: 1.97
FPS: 1.96
FPS: 2.03
FPS: 2.09
FPS: 1.84
FPS: 1.88
FPS: 2.01
FPS: 2.01
FPS: 1.99
FPS: 1.93
FPS: 1.97
FPS: 1.95
FPS: 2.07
FPS: 1.93
FPS: 2.00
FPS: 2.01
FPS: 1.96
FPS: 1.94
FPS: 1.99
FPS: 1.94
FPS: 2.01
FPS: 1.96
FPS: 18.13
FPS: 2.05
FPS: 2.04
FPS: 1.87
FPS: 2.04
FPS: 2.06
FPS: 2.14
FPS: 17.42
FPS: 2.04
FPS: 1.97
FPS: 1.97
FPS: 2.01
FPS: 1.99
FPS: 1.85
FPS: 2.15
FPS: 2.08
FPS: 1.99
FPS: 1.92
FPS: 2.15
FPS: 1.92
FPS: 1.94
FPS: 2.05
FPS: 2.02
FPS: 1.89
FPS: 1.91
FPS: 2.11
FPS: 2.14
FPS: 1.86
FPS: 1.82
FPS: 2.02
FPS: 2.26
FPS: 1.98
FPS: 1.88
FPS: 2.24
FPS: 1.97
FPS: 1.91
FPS: 2.00
FPS: 2.07
FPS: 1.85
FPS: 1.90
FPS: 1.91
FPS: 2.03
FPS: 1.99
FPS: 1.73
FPS: 1.89
FPS: 2.10
FPS: 1.95
FPS: 1.84
FPS: 2.09
FPS: 2.02
FPS: 2.09
FPS: 16.57
FPS: 18.50
FPS: 1.97
FPS: 1.96
FPS: 1.92
FPS: 1.93
FPS: 2.11
FPS: 1.91
FPS: 2.15
FPS: 2.00
FPS: 1.86
FPS: 2.06
FPS: 2.30
FPS: 1.96
FPS: 1.92
FPS: 1.98
FPS: 2.15
FPS: 2.12
FPS: 2.09
FPS: 1.84
FPS: 2.15
FPS: 1.95
FPS: 2.00
FPS: 2.07
Video has ended or failed, try a different video format!
Did you solve this issue @GOBish ?
nope
@GOBish How did you convert the model to TRT? I assume you are using TensorFlow-TRT, is that right?
Hi @GOBish -- I would also be curious how you obtained those numbers. Using TRT?