TensorRT
TensorRT copied to clipboard
Polygraphy validation failed for TensorRT BERT model
Description
I did log a issue in Triton server - https://github.com/triton-inference-server/server/issues/4842 and @rmccorm4 suggested to log issue in TensorRT because Polygraphy validation is failed for BERT model.
Environment
TensorRT Version: 8.2.5-1+cuda11.4 NVIDIA GPU: Tesla T4 NVIDIA Driver Version: 510.47.03 CUDA Version: 11.7 CUDNN Version: Operating System: Linux Python Version (if applicable): 3.8.13 Tensorflow Version (if applicable): PyTorch Version (if applicable): 1.12.0 Baremetal or Container (if so, version): nvcr.io/nvidia/pytorch:22.05-py3
Steps To Reproduce
-
onnx model - polygraphy validation is passed. Results: onnx-model-validation-results.txt
-
Converted the onnx model to trt model with minshapes: 1x1 using below command and its polygraphy validation is failed. trtexec --onnx=model.onnx --saveEngine=model_bs16.plan --minShapes=input_ids:1x1,attention_mask:1x1,token_type_ids:1x1 --optShapes=input_ids:16x128,attention_mask:16x128,token_type_ids:16x128 --maxShapes=input_ids:128x128,attention_mask:128x128,token_type_ids:128x128 --fp16 --verbose --workspace=14000 | tee conversion_bs16_dy.txt
Results: ort-model-minshape-1x1-validation-results.txt
- Converted the onnx model to trt model with minshapes: 1x128 using below command and its polygraphy validation is failed. trtexec --onnx=model.onnx --saveEngine=model_bs16.plan --minShapes=input_ids:1x128,attention_mask:1x128,token_type_ids:1x128 --optShapes=input_ids:16x128,attention_mask:16x128,token_type_ids:16x128 --maxShapes=input_ids:128x128,attention_mask:128x128,token_type_ids:128x128 --fp16 --verbose --workspace=14000 | tee conversion_bs16_dy.txt Results: ort-model-minshape-1x128-validation-results.txt
Please find the model.onnx file on G-drive
Please let me know if you need any additional details. Thanks
I'll check this later
@Vinayaks117 I don't have much time now, can you try the latest TRT on your side?
@zerollzeng
I tried it with below environment and polygraphy validation is passed. Thanks
TensorRT Version: 8.4.1 Baremetal or Container (if so, version): nvcr.io/nvidia/pytorch:22.07-py3
My observations: I found that there are some unnamed layers while converting onnx model to trt, not sure if those layers are causing issues in 8.2.5 TensorRT and 22.05 pytorch container.
[09/23/2022-10:37:35] [V] [TRT] ConstShuffleFusion: Fusing bert.encoder.layer.10.output.LayerNorm.weight with (Unnamed Layer* 1789) [Shuffle] [09/23/2022-10:37:35] [V] [TRT] Running: ConstShuffleFusion on bert.encoder.layer.10.output.LayerNorm.bias [09/23/2022-10:37:35] [V] [TRT] ConstShuffleFusion: Fusing bert.encoder.layer.10.output.LayerNorm.bias with (Unnamed Layer* 1792) [Shuffle] [09/23/2022-10:37:35] [V] [TRT] Running: ConstShuffleFusion on onnx::MatMul_1752 [09/23/2022-10:37:35] [V] [TRT] ConstShuffleFusion: Fusing onnx::MatMul_1752 with (Unnamed Layer* 1795) [Shuffle] [09/23/2022-10:37:35] [V] [TRT] Running: ConstShuffleFusion on bert.encoder.layer.11.attention.self.query.bias [09/23/2022-10:37:35] [V] [TRT] ConstShuffleFusion: Fusing bert.encoder.layer.11.attention.self.query.bias with (Unnamed Layer* 1798) [Shuffle] [09/23/2022-10:37:35] [V] [TRT] Running: ConstShuffleFusion on onnx::MatMul_1753 [09/23/2022-10:37:35] [V] [TRT] ConstShuffleFusion: Fusing onnx::MatMul_1753 with (Unnamed Layer* 1801) [Shuffle] [09/23/2022-10:37:35] [V] [TRT] Running: ConstShuffleFusion on bert.encoder.layer.11.attention.self.key.bias [09/23/2022-10:37:35] [V] [TRT] ConstShuffleFusion: Fusing bert.encoder.layer.11.attention.self.key.bias with (Unnamed Layer* 1804) [Shuffle] [09/23/2022-10:37:35] [V] [TRT] Running: ConstShuffleFusion on onnx::MatMul_1756 [09/23/2022-10:37:35] [V] [TRT] ConstShuffleFusion: Fusing onnx::MatMul_1756 with (Unnamed Layer* 1817) [Shuffle] [09/23/2022-10:37:35] [V] [TRT] Running: ConstShuffleFusion on bert.encoder.layer.11.attention.self.value.bias [09/23/2022-10:37:35] [V] [TRT] ConstShuffleFusion: Fusing bert.encoder.layer.11.attention.self.value.bias with (Unnamed Layer* 1820) [Shuffle]
Please find the attached conversion logs. conversion.txt
Should be a same issue as https://github.com/NVIDIA/TensorRT/issues/2338. can be fixed with preview feature in TRT 8.5.1
&&&& PASSED TensorRT.trtexec [TensorRT v8501] # trtexec --onnx=model.onnx preview=+fasterDynamicShapes0805 --saveEngine=model_bs16.plan --minShapes=input_ids:1x128,attention_mask:1x128,token_type_ids:1x128 --optShapes=input_ids:16x128,attention_mask:16x128,token_type_ids:16x128 --maxShapes=input_ids:128x128,attention_mask:128x128,token_type_ids:128x128 --fp16 --verbose --workspace=14000 --
...
[I] PASSED | Output: logits is valid
[I] PASSED | Output Validation
[V] Loaded Module: sys
[I] PASSED | Command: /home/zeroz/.local/bin/polygraphy run --trt model_bs16.plan --validate -vv
TRT 8.5.1 has been released.
Closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!