torch2trt
torch2trt copied to clipboard
tensorRT FPS lower than torch
hi, I'm trying to run pointpillar on tensorRT, and I am confused about its performance.
the structure of my model are the same as this onnx graph, which is part of pre-process of PointPillar, a pointcloud-related model. I applied matmul operator mentioned in https://github.com/NVIDIA-AI-IOT/torch2trt/issues/587, and I have set max_batch_size to 10000. The input shape of model is [10000,32,10] because points in pointcloud are processed in parallel. But the test result shows that it takes 12.5ms and 25ms running on pytorch and tensorRT, respectively. Running this model on tensorRT is slower than running on GPU with float32.
Could anyone tell me why would this happened? Is there any operators in this graph not tensorRT-friendly?
Thanks in advance, hoping to hear your reply.
Hi @Jiang-Stan ,
Thanks for reaching out!
Have you tried the following project?
https://github.com/NVIDIA-AI-IOT/CUDA-PointPillars
I don't currently have experience with PointPillars, but perhaps this will help.
Please let me know if this helps, or you have any questions.
Best, John
I met the same problem with x3d model from pytorchvideo, tensorrt model slower than torch model( about ~2.5 time). Did anyone have the same performance with similar model?