yolo-tensorrt icon indicating copy to clipboard operation
yolo-tensorrt copied to clipboard

Pre processing, Inference and Post processing are slow

Open moromatt opened this issue 3 years ago • 7 comments

Hi @enazoe, I'm currently using yolov5 trying different batch sizes. I'm having large inference time, and also pre processing and nms are really slow. I've tested with i7 8th gen, NVIDIA GTX 2080Ti, 16GB Ram.

I've already seen issue #99

In the image you can see the timings that I'm obtaining, as you can see they are really not comparable with the Pytorch implementation. immagine

I do not understand if I'm doing something wrong, is it possible that tensorrt is working on the CPU? Thanks in advance

moromatt avatar Mar 11 '21 14:03 moromatt

What is the batchsize and the model width and height?

enazoe avatar Mar 12 '21 02:03 enazoe

https://github.com/enazoe/yolo-tensorrt/issues/99#issuecomment-767974798

enazoe avatar Mar 12 '21 02:03 enazoe

I'm actually using yolov5L with batch size = 4, img size = [800,800] During inference in Pytorch with my 2080Ti I was using a batch size up to 16, with roughly 20ms of inference for each image.

moromatt avatar Mar 12 '21 07:03 moromatt

you push 16 images at same time and get the inference time is 20ms per image? And you should note yolov5 is daynamic inpute.

enazoe avatar Mar 12 '21 08:03 enazoe

you push 16 images at same time and get the inference time is 20ms per image? And you should note yolov5 is daynamic inpute.

This is what I usually get immagine

About the dynamic input, does it affect the performance of the model in some way? Btw I'm always using 800x800 imgs

moromatt avatar Mar 12 '21 08:03 moromatt

Hi @enazoe , I'm currently trying to infer over 1 image of 800x800 with an i7 8th gen, NVIDIA GTX 2080Ti, 16GB Ram. My current environment is:

  • TensorRT-7.1.3.4
  • CUDA 11.0
  • OpenCV 3.4.6

I'm measuring the time of the three main functions:

  • preprocessing: cv::Mat trtInput = blobFromDsImages(vec_ds_images, _p_net->getInputH(),_p_net->getInputW());
  • inference: _p_net->doInference(trtInput.data, vec_ds_images.size());
  • post processing: auto binfo = _p_net->decodeDetections(i, max_height, max_width);

immagine

As you can see the preprocessing and post processing are roughtly two times the inference time. It's not possible that my GPU is already saturated with a single image, could you please give me any hints about how to get a more normal time relative to the pre and post processing?

Thanks in advance

moromatt avatar Mar 18 '21 10:03 moromatt

hey, i also have this problem, the function “decodeDetections” is very very slow. almost 80ms in jetson nx.

ccccwb avatar Mar 03 '23 07:03 ccccwb