mmdeploy Result divergence of deployed ConvNeXt model

I used the latest MMDeploy tool to convert ConvNeXt into tensorRT backend. It converted successfully. However, when I visualized the results, I found the output from tensorRT backend was totally wrong.

I did checked the export process. I found that the output from intermediate ONNX model was correct. So I guess this issue is from the tensorRT model.

Below is the command and configs I have used.

python tools/deploy.py \
configs/mmseg/segmentation_tensorrt-fp16_static-512x512.py \
./mmsegmentation/configs/convnext/upernet_convnext_tiny_fp16_512x512_160k_ade20k.py \
./mmsegmentation/ckpts/upernet_convnext_tiny_fp16_512x512_160k_ade20k_20220227_124553-cad485de.pth \ 
./test.jpg \
--work-dir work-dirs 
--device cuda:0

segmentation_tensorrt-fp16_static-512x512.py

upernet_convnext_tiny_fp16_512x512_160k_ade20k.py

upernet_convnext_tiny_fp16_512x512_160k_ade20k_20220227_124553-cad485de.pth

Jun 01 '22 08:06 haofanwang

@haofanwang Are you testing onnx with FP32 in onnxruntime? Maybe you could try with tensorrt FP32 as well. BTW, we'll check the problems of exporting PyTorch FP16 model to TensorRT FP16/INT8 later.

Jun 02 '22 02:06 RunningLeon

FP32 works. Please let me know if there any update in FP16, thx.

Jun 02 '22 05:06 haofanwang

FP32 works. Please let me know if there any update in FP16, thx.

Hi, Yes, there is a noticeable difference between FP16 PyTorch model and the converted FP16 TensorRT model. Not sure which layer has problems such as numerical overflow. Maybe you could ask suggestions from nvidia repo as well.

Jun 06 '22 08:06 RunningLeon

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

Dec 17 '22 01:12 github-actions[bot]

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

Dec 25 '22 01:12 github-actions[bot]

@haofanwang have you solved your problem?

Feb 10 '23 15:02 huliang2016

Same here. convnext in a detection model is prone to trt fp16. For convnext v2, the problem is even worse due to GRN. With that, trt fp16 can't detection anything!

Feb 02 '24 07:02 sipie800

mmdeploy mmdeploy copied to clipboard

Result divergence of deployed ConvNeXt model

mmdeploy
mmdeploy copied to clipboard