yolo-tensorrt
yolo-tensorrt copied to clipboard
Pre processing, Inference and Post processing are slow
Hi @enazoe, I'm currently using yolov5 trying different batch sizes. I'm having large inference time, and also pre processing and nms are really slow. I've tested with i7 8th gen, NVIDIA GTX 2080Ti, 16GB Ram.
I've already seen issue #99
In the image you can see the timings that I'm obtaining, as you can see they are really not comparable with the Pytorch implementation.
I do not understand if I'm doing something wrong, is it possible that tensorrt is working on the CPU? Thanks in advance
What is the batchsize and the model width and height?
https://github.com/enazoe/yolo-tensorrt/issues/99#issuecomment-767974798
I'm actually using yolov5L with batch size = 4, img size = [800,800] During inference in Pytorch with my 2080Ti I was using a batch size up to 16, with roughly 20ms of inference for each image.
you push 16 images at same time and get the inference time is 20ms per image? And you should note yolov5 is daynamic inpute.
you push 16 images at same time and get the inference time is 20ms per image? And you should note yolov5 is daynamic inpute.
This is what I usually get
About the dynamic input, does it affect the performance of the model in some way? Btw I'm always using 800x800 imgs
Hi @enazoe , I'm currently trying to infer over 1 image of 800x800 with an i7 8th gen, NVIDIA GTX 2080Ti, 16GB Ram. My current environment is:
- TensorRT-7.1.3.4
- CUDA 11.0
- OpenCV 3.4.6
I'm measuring the time of the three main functions:
- preprocessing:
cv::Mat trtInput = blobFromDsImages(vec_ds_images, _p_net->getInputH(),_p_net->getInputW());
- inference:
_p_net->doInference(trtInput.data, vec_ds_images.size());
- post processing: auto binfo =
_p_net->decodeDetections(i, max_height, max_width);
As you can see the preprocessing and post processing are roughtly two times the inference time. It's not possible that my GPU is already saturated with a single image, could you please give me any hints about how to get a more normal time relative to the pre and post processing?
Thanks in advance
hey, i also have this problem, the function “decodeDetections” is very very slow. almost 80ms in jetson nx.