TensorRT
TensorRT copied to clipboard
performance of concurrent with different module.
Description
I have two different module and convert to trt. when I run them in Serial. the cost time of only infer:
//10 times
do_infer >> cost 400.60 msec. //warn-up
do_infer >> cost 42.22 msec.
do_infer >> cost 11.69 msec.
do_infer >> cost 37.12 msec.
do_infer >> cost 9.97 msec.
do_infer >> cost 33.33 msec.
do_infer >> cost 9.87 msec.
do_infer >> cost 34.53 msec.
do_infer >> cost 9.96 msec.
do_infer >> cost 34.66 msec.
do_infer >> cost 10.88 msec.
do_infer >> cost 35.41 msec.
do_infer >> cost 11.10 msec.
do_infer >> cost 33.84 msec.
do_infer >> cost 10.00 msec.
do_infer >> cost 33.42 msec.
do_infer >> cost 10.08 msec.
do_infer >> cost 34.65 msec.
do_infer >> cost 10.63 msec.
do_infer >> cost 34.66 msec.
in Parallel with two threads.
do_infer >> cost 408.90 msec //warn-up
do_infer >> cost 407.14 msec. //warn-up
do_infer >> cost 11.41 msec.
do_infer >> cost 50.40 msec.
do_infer >> cost 13.05 msec.
do_infer >> cost 50.36 msec.
do_infer >> cost 44.26 msec.
do_infer >> cost 43.02 msec.
do_infer >> cost 44.29 msec.
do_infer >> cost 43.25 msec.
do_infer >> cost 50.69 msec.
do_infer >> cost 49.08 msec.
do_infer >> cost 48.10 msec.
do_infer >> cost 47.28 msec.
do_infer >> cost 50.19 msec.
do_infer >> cost 48.67 msec.
do_infer >> cost 47.18 msec.
do_infer >> cost 46.64 msec.
do_infer >> cost 12.24 msec.
do_infer >> cost 46.06 msec.
why performance is not good ?
Environment
TensorRT Version: 8.4.3.1
NVIDIA GPU: RTX3070
NVIDIA Driver Version: 470.74
CUDA Version: 11.4
CUDNN Version: 8.5.0
Operating System: ubuntu-18.04
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):