TensorRT FP16 failure of TensorRT 8.6.1.6 when running GroundingDINO on GPU GeForce RTX 3080 Ti

Description

I tried to convert GroundDINO onnx FP16 (op_set=17) to TensortRT FP16， the difference of the output is large.Then I tried to control input and output types of the INormalization layer, using the following code. When converting model, it runs with the following error.It seems the problem with datatype,but i do not know how to fix this.

Environment

TensorRT Version:8.6.1.6

NVIDIA GPU:GeForce RTX 3080 Ti

NVIDIA Driver Version:515.43.04

CUDA Version:11.7

CUDNN Version:8.9.5.29

Operating System:Ubuntu20.04

Python Version (if applicable):python3.9

Tensorflow Version (if applicable):

PyTorch Version (if applicable):torch2.0.1

Have you tried the latest release?:I didn‘t find the TensorRT 9.1

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): ONNX model FP16 is fine.

Nov 01 '23 06:11 zhuyingSeu

@nvpohanh Will inset explicit cast works here?

Nov 04 '23 01:11 zerollzeng

I also come across the same problem in 3090 GPUs, this has been bugging me for days.

Nov 04 '23 15:11 HaisongDing

Could you share the ONNX file? It seems that the layer precision setting is conflicting with Cast ops in the ONNX files.

Nov 06 '23 02:11 nvpohanh

How to build mixed precision engine ? Can you provide an example ?

Nov 06 '23 07:11 monsterlyg

I tried setting the backbone to use FP16, the encoder-decoder part to use FP32. The results roughly match with the FP32 engine. But it is not as fast as the pure FP16 version.

I notice that the NormalizationLayer always use FP32 no matter what the building config is (correct me if I am wrong).

Maybe the overflow is caused by other nodes. I am still looking deeper into the details.

Nov 06 '23 08:11 HaisongDing

How can you set different precision in different part when creating engine? Can you show me a code example? @HaisongDing

Nov 07 '23 01:11 monsterlyg

How can you set different precision in different part when creating engine? Can you show me a code example? @HaisongDing

For example, in the detectron2 example.

Adding something like will make all MATRIX_MULTIPLY and SOFTMAX nodes run in FP32 except in backbone:

for layer in self.network:
    if not 'backbone' in str(layer.name):
        if layer.type in [trt.LayerType.MATRIX_MULTIPLY, trt.LayerType.SOFTMAX]:
            layer.precision = trt.DataType.FLOAT
            layer.get_output(0).dtype = trt.DataType.FLOAT

Edit: OBEY_PRECISION_CONSTRAINTS should also be set.

self.config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)

Nov 07 '23 02:11 HaisongDing

How can you set different precision in different part when creating engine? Can you show me a code example? @HaisongDing

For example, in the detectron2 example.

Adding something like will make all MATRIX_MULTIPLY and SOFTMAX nodes run in FP32 except in backbone:
for layer in self.network:
    if not 'backbone' in str(layer.name):
        if layer.type in [trt.LayerType.MATRIX_MULTIPLY, trt.LayerType.SOFTMAX]:
            layer.precision = trt.DataType.FLOAT
            layer.get_output(0).dtype = trt.DataType.FLOAT
Edit: OBEY_PRECISION_CONSTRAINTS should also be set.
self.config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)

I set all layers in backbone with 'trt.DataType.FLOAT', but still get WARNING like 'Detected layernorm nodes in FP16: .../backbone/stages.1/blocks.1/norm1/Add,...'. The layernorm is not set by expected precision. So it seems invalid.

Nov 07 '23 02:11 monsterlyg

@monsterlyg Update torch to >=1.13.1 to use opset 17 when exporting to onnx. Update tensorrt 8.6.1 to use INormalization layers.

Nov 07 '23 03:11 HaisongDing

@HaisongDing Yes, I have used torch-1.13.1, opset17, tensorrt-8.6.1. The layernorm nodes in onnx are still described as 'Sub, Add, ReduceMean...'. Should it be a whole INormalization node if used properly?

Nov 07 '23 03:11 monsterlyg

I tried setting the backbone to use FP16, the encoder-decoder part to use FP32. The results roughly match with the FP32 engine. But it is not as fast as the pure FP16 version.

I notice that the NormalizationLayer always use FP32 no matter what the building config is (correct me if I am wrong).

Maybe the overflow is caused by other nodes. I am still looking deeper into the details.

After some experiments, I find out the reason for this discrepency is the SOFTMAX node in Grounding-DINO's encoder-decoder part. After setting them to use FP32, the discrepency is almost gone. Also, running these SOFTMAX nodes in FP32 will not affect the inference latency much.

@monsterlyg As for the Normalization layer, maybe it is not written exactly as a torch.nn.LayerNorm layer, which causes ONNX fail to recognize the operation sequences as a Normalization node. I didn't notice these errors on my sides.

Nov 07 '23 07:11 HaisongDing

I tried setting the backbone to use FP16, the encoder-decoder part to use FP32. The results roughly match with the FP32 engine. But it is not as fast as the pure FP16 version. I notice that the NormalizationLayer always use FP32 no matter what the building config is (correct me if I am wrong). Maybe the overflow is caused by other nodes. I am still looking deeper into the details.

After some experiments, I find out the reason for this discrepency is the SOFTMAX node in Grounding-DINO's encoder-decoder part. After setting them to use FP32, the discrepency is almost gone. Also, running these SOFTMAX nodes in FP32 will not affect the inference latency much.

@monsterlyg As for the Normalization layer, maybe it is not written exactly as a torch.nn.LayerNorm layer, which causes ONNX fail to recognize the operation sequences as a Normalization node. I didn't notice these errors on my sides.

@HaisongDing I tried to set the datatype of softmax layer to float, then the converted model can output results, but the results are incorrect. Did you use the converted model to inference on image or test ap on COCO dataset to compare the accuary with the onnx model?

Nov 10 '23 05:11 zhuyingSeu

I only converted a customized Grounding-DINO model. Also the BERT part is pre-computed in my setups. So only the backbone and encoder-decoder are converted to Tensor-RT. On my customized dataset, the test AP50 is almost the same (within absolute 0.1% difference).

Nov 12 '23 06:11 HaisongDing

Could you share the ONNX file? It seems that the layer precision setting is conflicting with Cast ops in the ONNX files.

@nvpohanh I have uploaded the onnx file: https://drive.google.com/file/d/1IbMfTnsXbyqmpZ2Mfb7SC1oxzyYn9QBN/view?usp=drive_link

Nov 20 '23 07:11 zhuyingSeu

I only converted a customized Grounding-DINO model. Also the BERT part is pre-computed in my setups. So only the backbone and encoder-decoder are converted to Tensor-RT. On my customized dataset, the test AP50 is almost the same (within absolute 0.1% difference).

@HaisongDing I also tried the method, but the result is the same as before. Can you share your ONNX2TensoRT script?Thanks.

Nov 20 '23 07:11 zhuyingSeu

@zerollzeng Could you repro this and file an internal tracker? Let me know if you don't have the bandwidth. Thanks

Nov 20 '23 08:11 nvpohanh

Anyone can provide a step to reproduce? Thanks!

Nov 21 '23 13:11 zerollzeng

@zhuyingSeu Could you share the code you used to build the engine? Or did you use trtexec?

Nov 22 '23 01:11 nvpohanh

@zerollzeng @nvpohanh I used this script to build the engine:https://drive.google.com/file/d/1kQdR2QdhOHJHnGB3fDicnGl73zpE2YJ6/view?usp=drive_link

Nov 24 '23 01:11 zhuyingSeu

@zerollzeng @nvpohanh I used this script to build the engine:https://drive.google.com/file/d/1kQdR2QdhOHJHnGB3fDicnGl73zpE2YJ6/view?usp=drive_link @zhuyingSeu Hello big shot, may I ask if it is possible to apply through online storage? I also want to see the Tensorrt conversion process. Not very grateful！

Dec 25 '23 02:12 xiyangyang99

Requested access.

Dec 27 '23 13:12 zerollzeng

Requested access. Thank you for your reply, I have already used [email protected] This account has been applied for, why do I still need to apply for access permissions?

Dec 28 '23 00:12 xiyangyang99

Requested access. phone and PC is the same

Dec 28 '23 01:12 xiyangyang99

Could you share the ONNX file? It seems that the layer precision setting is conflicting with Cast ops in the ONNX files.

@nvpohanh I have uploaded the onnx file: https://drive.google.com/file/d/1IbMfTnsXbyqmpZ2Mfb7SC1oxzyYn9QBN/view?usp=drive_link

Hello author, the onnx file you provided can be converted to engine using the trtexec tool. May I know which script you used to convert to onnx? Would it be convenient to provide a script? Thank you!

Jan 04 '24 09:01 xiyangyang99

TensorRT TensorRT copied to clipboard

FP16 failure of TensorRT 8.6.1.6 when running GroundingDINO on GPU GeForce RTX 3080 Ti

Description

Environment

TensorRT
TensorRT copied to clipboard