TensorRT
TensorRT copied to clipboard
FP16 failure of TensorRT 8.6.1.6 when running GroundingDINO on GPU GeForce RTX 3080 Ti
Description
I tried to convert GroundDINO onnx FP16 (op_set=17) to TensortRT FP16, the difference of the output is large.Then I tried to control input and output types of the INormalization layer, using the following code.
When converting model, it runs with the following error.It seems the problem with datatype,but i do not know how to fix this.
Environment
TensorRT Version:8.6.1.6
NVIDIA GPU:GeForce RTX 3080 Ti
NVIDIA Driver Version:515.43.04
CUDA Version:11.7
CUDNN Version:8.9.5.29
Operating System:Ubuntu20.04
Python Version (if applicable):python3.9
Tensorflow Version (if applicable):
PyTorch Version (if applicable):torch2.0.1
Have you tried the latest release?:I didn‘t find the TensorRT 9.1
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): ONNX model FP16 is fine.
@nvpohanh Will inset explicit cast works here?
I also come across the same problem in 3090 GPUs, this has been bugging me for days.
Could you share the ONNX file? It seems that the layer precision setting is conflicting with Cast ops in the ONNX files.
How to build mixed precision engine ? Can you provide an example ?
I tried setting the backbone to use FP16, the encoder-decoder part to use FP32. The results roughly match with the FP32 engine. But it is not as fast as the pure FP16 version.
I notice that the NormalizationLayer always use FP32 no matter what the building config is (correct me if I am wrong).
Maybe the overflow is caused by other nodes. I am still looking deeper into the details.
How can you set different precision in different part when creating engine? Can you show me a code example? @HaisongDing
How can you set different precision in different part when creating engine? Can you show me a code example? @HaisongDing
For example, in the detectron2 example.
Adding something like will make all MATRIX_MULTIPLY and SOFTMAX nodes run in FP32 except in backbone:
for layer in self.network:
if not 'backbone' in str(layer.name):
if layer.type in [trt.LayerType.MATRIX_MULTIPLY, trt.LayerType.SOFTMAX]:
layer.precision = trt.DataType.FLOAT
layer.get_output(0).dtype = trt.DataType.FLOAT
Edit: OBEY_PRECISION_CONSTRAINTS should also be set.
self.config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)
How can you set different precision in different part when creating engine? Can you show me a code example? @HaisongDing
For example, in the detectron2 example.
Adding something like will make all
MATRIX_MULTIPLYandSOFTMAXnodes run in FP32 except inbackbone:for layer in self.network: if not 'backbone' in str(layer.name): if layer.type in [trt.LayerType.MATRIX_MULTIPLY, trt.LayerType.SOFTMAX]: layer.precision = trt.DataType.FLOAT layer.get_output(0).dtype = trt.DataType.FLOATEdit: OBEY_PRECISION_CONSTRAINTS should also be set.
self.config.set_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)
I set all layers in backbone with 'trt.DataType.FLOAT', but still get WARNING like 'Detected layernorm nodes in FP16: .../backbone/stages.1/blocks.1/norm1/Add,...'. The layernorm is not set by expected precision. So it seems invalid.
@monsterlyg Update torch to >=1.13.1 to use opset 17 when exporting to onnx. Update tensorrt 8.6.1 to use INormalization layers.
@HaisongDing Yes, I have used torch-1.13.1, opset17, tensorrt-8.6.1. The layernorm nodes in onnx are still described as 'Sub, Add, ReduceMean...'. Should it be a whole INormalization node if used properly?
I tried setting the backbone to use FP16, the encoder-decoder part to use FP32. The results roughly match with the FP32 engine. But it is not as fast as the pure FP16 version.
I notice that the NormalizationLayer always use FP32 no matter what the building config is (correct me if I am wrong).
Maybe the overflow is caused by other nodes. I am still looking deeper into the details.
After some experiments, I find out the reason for this discrepency is the SOFTMAX node in Grounding-DINO's encoder-decoder part. After setting them to use FP32, the discrepency is almost gone. Also, running these SOFTMAX nodes in FP32 will not affect the inference latency much.
@monsterlyg As for the Normalization layer, maybe it is not written exactly as a torch.nn.LayerNorm layer, which causes ONNX fail to recognize the operation sequences as a Normalization node. I didn't notice these errors on my sides.
I tried setting the backbone to use FP16, the encoder-decoder part to use FP32. The results roughly match with the FP32 engine. But it is not as fast as the pure FP16 version. I notice that the NormalizationLayer always use FP32 no matter what the building config is (correct me if I am wrong). Maybe the overflow is caused by other nodes. I am still looking deeper into the details.
After some experiments, I find out the reason for this discrepency is the SOFTMAX node in Grounding-DINO's encoder-decoder part. After setting them to use FP32, the discrepency is almost gone. Also, running these SOFTMAX nodes in FP32 will not affect the inference latency much.
@monsterlyg As for the Normalization layer, maybe it is not written exactly as a torch.nn.LayerNorm layer, which causes ONNX fail to recognize the operation sequences as a Normalization node. I didn't notice these errors on my sides.
@HaisongDing I tried to set the datatype of softmax layer to float, then the converted model can output results, but the results are incorrect. Did you use the converted model to inference on image or test ap on COCO dataset to compare the accuary with the onnx model?
I only converted a customized Grounding-DINO model. Also the BERT part is pre-computed in my setups. So only the backbone and encoder-decoder are converted to Tensor-RT. On my customized dataset, the test AP50 is almost the same (within absolute 0.1% difference).
Could you share the ONNX file? It seems that the layer precision setting is conflicting with Cast ops in the ONNX files.
@nvpohanh I have uploaded the onnx file: https://drive.google.com/file/d/1IbMfTnsXbyqmpZ2Mfb7SC1oxzyYn9QBN/view?usp=drive_link
I only converted a customized Grounding-DINO model. Also the BERT part is pre-computed in my setups. So only the backbone and encoder-decoder are converted to Tensor-RT. On my customized dataset, the test AP50 is almost the same (within absolute 0.1% difference).
@HaisongDing I also tried the method, but the result is the same as before. Can you share your ONNX2TensoRT script?Thanks.
@zerollzeng Could you repro this and file an internal tracker? Let me know if you don't have the bandwidth. Thanks
Anyone can provide a step to reproduce? Thanks!
@zhuyingSeu Could you share the code you used to build the engine? Or did you use trtexec?
@zerollzeng @nvpohanh I used this script to build the engine:https://drive.google.com/file/d/1kQdR2QdhOHJHnGB3fDicnGl73zpE2YJ6/view?usp=drive_link
@zerollzeng @nvpohanh I used this script to build the engine:https://drive.google.com/file/d/1kQdR2QdhOHJHnGB3fDicnGl73zpE2YJ6/view?usp=drive_link @zhuyingSeu Hello big shot, may I ask if it is possible to apply through online storage? I also want to see the Tensorrt conversion process. Not very grateful!
Requested access.
Requested access. Thank you for your reply, I have already used [email protected] This account has been applied for, why do I still need to apply for access permissions?
Requested access. phone and PC is the same
Could you share the ONNX file? It seems that the layer precision setting is conflicting with Cast ops in the ONNX files.
@nvpohanh I have uploaded the onnx file: https://drive.google.com/file/d/1IbMfTnsXbyqmpZ2Mfb7SC1oxzyYn9QBN/view?usp=drive_link
Hello author, the onnx file you provided can be converted to engine using the trtexec tool. May I know which script you used to convert to onnx? Would it be convenient to provide a script? Thank you!