TensorRT Polygraphy validation failed for TensorRT BERT model

Description

I did log a issue in Triton server - https://github.com/triton-inference-server/server/issues/4842 and @rmccorm4 suggested to log issue in TensorRT because Polygraphy validation is failed for BERT model.

Environment

TensorRT Version: 8.2.5-1+cuda11.4 NVIDIA GPU: Tesla T4 NVIDIA Driver Version: 510.47.03 CUDA Version: 11.7 CUDNN Version: Operating System: Linux Python Version (if applicable): 3.8.13 Tensorflow Version (if applicable): PyTorch Version (if applicable): 1.12.0 Baremetal or Container (if so, version): nvcr.io/nvidia/pytorch:22.05-py3

Steps To Reproduce

onnx model - polygraphy validation is passed. Results: onnx-model-validation-results.txt
Converted the onnx model to trt model with minshapes: 1x1 using below command and its polygraphy validation is failed. trtexec --onnx=model.onnx --saveEngine=model_bs16.plan --minShapes=input_ids:1x1,attention_mask:1x1,token_type_ids:1x1 --optShapes=input_ids:16x128,attention_mask:16x128,token_type_ids:16x128 --maxShapes=input_ids:128x128,attention_mask:128x128,token_type_ids:128x128 --fp16 --verbose --workspace=14000 | tee conversion_bs16_dy.txt

Results: ort-model-minshape-1x1-validation-results.txt

Converted the onnx model to trt model with minshapes: 1x128 using below command and its polygraphy validation is failed. trtexec --onnx=model.onnx --saveEngine=model_bs16.plan --minShapes=input_ids:1x128,attention_mask:1x128,token_type_ids:1x128 --optShapes=input_ids:16x128,attention_mask:16x128,token_type_ids:16x128 --maxShapes=input_ids:128x128,attention_mask:128x128,token_type_ids:128x128 --fp16 --verbose --workspace=14000 | tee conversion_bs16_dy.txt Results: ort-model-minshape-1x128-validation-results.txt

Please find the model.onnx file on G-drive

Please let me know if you need any additional details. Thanks

Sep 20 '22 14:09 vinayak-shanawad

I'll check this later

Sep 22 '22 14:09 zerollzeng

@Vinayaks117 I don't have much time now, can you try the latest TRT on your side?

Sep 22 '22 14:09 zerollzeng

@zerollzeng

I tried it with below environment and polygraphy validation is passed. Thanks

TensorRT Version: 8.4.1 Baremetal or Container (if so, version): nvcr.io/nvidia/pytorch:22.07-py3

My observations: I found that there are some unnamed layers while converting onnx model to trt, not sure if those layers are causing issues in 8.2.5 TensorRT and 22.05 pytorch container.

[09/23/2022-10:37:35] [V] [TRT] ConstShuffleFusion: Fusing bert.encoder.layer.10.output.LayerNorm.weight with (Unnamed Layer* 1789) [Shuffle] [09/23/2022-10:37:35] [V] [TRT] Running: ConstShuffleFusion on bert.encoder.layer.10.output.LayerNorm.bias [09/23/2022-10:37:35] [V] [TRT] ConstShuffleFusion: Fusing bert.encoder.layer.10.output.LayerNorm.bias with (Unnamed Layer* 1792) [Shuffle] [09/23/2022-10:37:35] [V] [TRT] Running: ConstShuffleFusion on onnx::MatMul_1752 [09/23/2022-10:37:35] [V] [TRT] ConstShuffleFusion: Fusing onnx::MatMul_1752 with (Unnamed Layer* 1795) [Shuffle] [09/23/2022-10:37:35] [V] [TRT] Running: ConstShuffleFusion on bert.encoder.layer.11.attention.self.query.bias [09/23/2022-10:37:35] [V] [TRT] ConstShuffleFusion: Fusing bert.encoder.layer.11.attention.self.query.bias with (Unnamed Layer* 1798) [Shuffle] [09/23/2022-10:37:35] [V] [TRT] Running: ConstShuffleFusion on onnx::MatMul_1753 [09/23/2022-10:37:35] [V] [TRT] ConstShuffleFusion: Fusing onnx::MatMul_1753 with (Unnamed Layer* 1801) [Shuffle] [09/23/2022-10:37:35] [V] [TRT] Running: ConstShuffleFusion on bert.encoder.layer.11.attention.self.key.bias [09/23/2022-10:37:35] [V] [TRT] ConstShuffleFusion: Fusing bert.encoder.layer.11.attention.self.key.bias with (Unnamed Layer* 1804) [Shuffle] [09/23/2022-10:37:35] [V] [TRT] Running: ConstShuffleFusion on onnx::MatMul_1756 [09/23/2022-10:37:35] [V] [TRT] ConstShuffleFusion: Fusing onnx::MatMul_1756 with (Unnamed Layer* 1817) [Shuffle] [09/23/2022-10:37:35] [V] [TRT] Running: ConstShuffleFusion on bert.encoder.layer.11.attention.self.value.bias [09/23/2022-10:37:35] [V] [TRT] ConstShuffleFusion: Fusing bert.encoder.layer.11.attention.self.value.bias with (Unnamed Layer* 1820) [Shuffle]

Please find the attached conversion logs. conversion.txt

Sep 23 '22 11:09 vinayak-shanawad

Should be a same issue as https://github.com/NVIDIA/TensorRT/issues/2338. can be fixed with preview feature in TRT 8.5.1

&&&& PASSED TensorRT.trtexec [TensorRT v8501] # trtexec --onnx=model.onnx preview=+fasterDynamicShapes0805  --saveEngine=model_bs16.plan --minShapes=input_ids:1x128,attention_mask:1x128,token_type_ids:1x128 --optShapes=input_ids:16x128,attention_mask:16x128,token_type_ids:16x128 --maxShapes=input_ids:128x128,attention_mask:128x128,token_type_ids:128x128 --fp16 --verbose --workspace=14000 --
...
[I]         PASSED | Output: logits is valid
[I]     PASSED | Output Validation
[V] Loaded Module: sys
[I] PASSED | Command: /home/zeroz/.local/bin/polygraphy run --trt model_bs16.plan --validate -vv

Sep 24 '22 03:09 zerollzeng

TRT 8.5.1 has been released.

Dec 02 '22 09:12 nvpohanh

Closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!

Dec 27 '22 01:12 ttyio

TensorRT TensorRT copied to clipboard

Polygraphy validation failed for TensorRT BERT model

Description

Environment

Steps To Reproduce

TensorRT
TensorRT copied to clipboard