TensorRT for cascaded networks
Hi,
Does anyone have experience converted Cascaded networks like CascadedMaskRCNN into TensorRT. It's not straightforward to convert it but I would love to know if anyone has given it a shot. I am confused about the pooling operators in general. Please let me know or point to any other repo/issue similar to this one.
Thanks!
Have you tried using the ONNX exporter? https://pytorch.org/docs/stable/onnx.html
Yes, I made it work. But, turns out my TensorRT engine is 2x slower than the pytorch version. The reasons would be one of these:
- I am using xformers in the Pytorch version and is not supported by tensorrt.
- As I have a cascaded head bounding box regresssion, I had to use 3 poolers in the onnx conversion while pytorch just use a for loop.
- I am using EfficientNMS after every box head to get the boxes to pass them to the next box head. Is there a better way to decode the boxes and send them to the next head?
- I am using fp32 in the tensorrt conversion. Fp16 gives 0 output and throws a warning of normalization layers in the model.
Please let me know if you can help with any of the above issues that I am facing.
I'm surprised by the perf number. How are you running the torch model. Did you use torch.compile()? How are you timing the TRT engine. If you run with trtexec you can get the latency profiling results at the end.
The performance is similar when I am not using the xformers but with xformers the pytorch model is pretty fast. I am just loading the checkpoint and getting the outputs of the model. I am timing using trtexec only.
When running with trtexec you can add the --dumpLayerInfo --dumpProfile flags to print out more detailed timing, and check if MHA shows up as a fused layer.
I'll run that and post it here. Also, I am facing one another problem. I have 6 different heads in my network, 4 of them is working fine. But, the other two head which are just classification and scoring heads with conv and linear layers. They are completely wrong, like the scale of the values and the value itself coming out to be wrong. Those are the simplest heads in the network. Does tensorrt changes the scale or something?
For accuracy issues I'd suggest running with onnxruntime to verify. Polygraphy provides a convenient way to compare the onnxruntime and TRT results side by side: polygraphy run --trt --onnxrt model.onnx