TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

TensorRT for cascaded networks

Open vsingh-sereact opened this issue 8 months ago • 7 comments

Hi,

Does anyone have experience converted Cascaded networks like CascadedMaskRCNN into TensorRT. It's not straightforward to convert it but I would love to know if anyone has given it a shot. I am confused about the pooling operators in general. Please let me know or point to any other repo/issue similar to this one.

Thanks!

vsingh-sereact avatar Apr 11 '25 12:04 vsingh-sereact

Have you tried using the ONNX exporter? https://pytorch.org/docs/stable/onnx.html

yuanyao-nv avatar Apr 22 '25 21:04 yuanyao-nv

Yes, I made it work. But, turns out my TensorRT engine is 2x slower than the pytorch version. The reasons would be one of these:

  1. I am using xformers in the Pytorch version and is not supported by tensorrt.
  2. As I have a cascaded head bounding box regresssion, I had to use 3 poolers in the onnx conversion while pytorch just use a for loop.
  3. I am using EfficientNMS after every box head to get the boxes to pass them to the next box head. Is there a better way to decode the boxes and send them to the next head?
  4. I am using fp32 in the tensorrt conversion. Fp16 gives 0 output and throws a warning of normalization layers in the model.

Please let me know if you can help with any of the above issues that I am facing.

vsingh-sereact avatar Apr 25 '25 12:04 vsingh-sereact

I'm surprised by the perf number. How are you running the torch model. Did you use torch.compile()? How are you timing the TRT engine. If you run with trtexec you can get the latency profiling results at the end.

yuanyao-nv avatar Apr 25 '25 18:04 yuanyao-nv

The performance is similar when I am not using the xformers but with xformers the pytorch model is pretty fast. I am just loading the checkpoint and getting the outputs of the model. I am timing using trtexec only.

vsingh-sereact avatar Apr 28 '25 08:04 vsingh-sereact

When running with trtexec you can add the --dumpLayerInfo --dumpProfile flags to print out more detailed timing, and check if MHA shows up as a fused layer.

yuanyao-nv avatar Apr 29 '25 16:04 yuanyao-nv

I'll run that and post it here. Also, I am facing one another problem. I have 6 different heads in my network, 4 of them is working fine. But, the other two head which are just classification and scoring heads with conv and linear layers. They are completely wrong, like the scale of the values and the value itself coming out to be wrong. Those are the simplest heads in the network. Does tensorrt changes the scale or something?

vsingh-sereact avatar Apr 29 '25 18:04 vsingh-sereact

For accuracy issues I'd suggest running with onnxruntime to verify. Polygraphy provides a convenient way to compare the onnxruntime and TRT results side by side: polygraphy run --trt --onnxrt model.onnx

yuanyao-nv avatar Apr 29 '25 20:04 yuanyao-nv