mmdeploy
mmdeploy copied to clipboard
Tensorrt batch inference is much slower when batch size is larger.
I convert a deeplabv3+ model to a Tensorrt engine with dynamic input size. I found when I use a larger batch size, the inference speed is much slower.
The test code is as below. When I use batch size 8, I got fps 7-8, when I use batchsize 1, I got fps about 60. It feels like batch inference here is doing inference one frame by frame, not in a batch.
`import time import numpy as np from mmdeploy_python import Segmentor
segmentor = Segmentor( model_path='./deeplabv3plus_dynamic_bud/', device_name='cuda', device_id=0) prev_frame_time = 0 new_frame_time = 0 batch_size = 8 im = np.zeros((batch_size, 512, 1024, 3)) while(True): seg = segmentor.batch(im) new_frame_time = time.time() fps = 1/(new_frame_time-prev_frame_time) prev_frame_time = new_frame_time fps = str(float(fps)) print(fps) `
@lvhan028
You may find this helpful https://github.com/open-mmlab/mmdeploy/issues/839#issuecomment-1206029364
I added "is_batched": true to pipeline.json, still get much slower performance with batch_size > 1.
The segmentation model is large enough to saturate your device, don't expect large speed up using batch inference.
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.
This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.