torch2trt icon indicating copy to clipboard operation
torch2trt copied to clipboard

tensorRT FPS lower than torch

Open Jiang-Stan opened this issue 2 years ago • 2 comments

hi, I'm trying to run pointpillar on tensorRT, and I am confused about its performance. 2022-06-20 13-25-43 的屏幕截图 the structure of my model are the same as this onnx graph, which is part of pre-process of PointPillar, a pointcloud-related model. I applied matmul operator mentioned in https://github.com/NVIDIA-AI-IOT/torch2trt/issues/587, and I have set max_batch_size to 10000. The input shape of model is [10000,32,10] because points in pointcloud are processed in parallel. But the test result shows that it takes 12.5ms and 25ms running on pytorch and tensorRT, respectively. Running this model on tensorRT is slower than running on GPU with float32.

Could anyone tell me why would this happened? Is there any operators in this graph not tensorRT-friendly?

Thanks in advance, hoping to hear your reply.

Jiang-Stan avatar Jun 20 '22 05:06 Jiang-Stan

Hi @Jiang-Stan ,

Thanks for reaching out!

Have you tried the following project?

https://github.com/NVIDIA-AI-IOT/CUDA-PointPillars

I don't currently have experience with PointPillars, but perhaps this will help.

Please let me know if this helps, or you have any questions.

Best, John

jaybdub avatar Jun 22 '22 15:06 jaybdub

I met the same problem with x3d model from pytorchvideo, tensorrt model slower than torch model( about ~2.5 time). Did anyone have the same performance with similar model?

hongsamvo avatar Jul 18 '22 07:07 hongsamvo