TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Conv2d stride=2 accuracy mismatch between PyTorch and TensorRT

Open schegde opened this issue 3 years ago • 7 comments

Description

Pytorch Conv2d operation with stride=2 after ONNX conversion and TensorRT inference shows accuracy mismatch against PyTorch inference. This is not seen for stride=1. The conv2D signature is nn.Conv2d(64, 1, 3, stride=STRIDE, bias=False, padding=1)

Environment

TensorRT Version: 8.2.1.8 NVIDIA GPU: NVIDIA GeForce RTX 3080 Laptop GPU NVIDIA Driver Version: 470.86 CUDA Version: 11.1 CUDNN Version: 8 Operating System: Ubuntu 20.04 Python Version (if applicable): 3.7.11 PyTorch Version (if applicable): 1.9.0+cu111 Baremetal or Container (if so, version): Docker image based off nvidia/cuda:11.1-cudnn8-devel-ubuntu20.04

Relevant Files

Test Code Gist

Steps To Reproduce

I am using ONNX-TensorRT to invoke TensorRT engine preparation and run as shown in the above script.I think this should be reproducible in TensorRT python API too. If this script is run for stride=1, then there are no mismatches seen between PyTorch and TensorRT for specified number of repeated runs. However stride=2 for given input channel size of 64, shows very low percentage of output element matches between PyTorch and TensorRT (10-50%). I am using FP16 here, but I tried FP32, even that doesn't help for this stride length.

schegde avatar Jan 27 '22 05:01 schegde

Can you try exporting the pytorch model to onnx and check the accuracy with polygraphy?

reference: https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy/examples/cli/run/01_comparing_frameworks

zerollzeng avatar Jan 30 '22 02:01 zerollzeng

Yes, I tried with polygraphy too, and I see the same type of mismatch for stride=2, and no mismatches for stride= 1!

Test file

Test code Gist

Stride=2

[I] trt-runner-N9-01/31/22-22:43:00 | Activating and starting inference [I] Configuring with profiles: [Profile().add(input, min=[1, 64, 9, 9], opt=[1, 64, 9, 9], max=[1, 64, 9, 9])] [I] Building engine with configuration: Workspace | 200000000 bytes (190.73 MiB) Precision | TF32: False, FP16: True, INT8: False, Obey Precision Constraints: False, Strict Types: False Tactic Sources | ['CUBLAS', 'CUBLAS_LT', 'CUDNN'] Safety Restricted | False Profiles | 1 profile(s) [01/31/2022-22:43:00] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0 [01/31/2022-22:43:00] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [01/31/2022-22:43:00] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0 [01/31/2022-22:43:00] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [I] Finished engine building in 0.256 seconds [01/31/2022-22:43:00] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0 [01/31/2022-22:43:00] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [01/31/2022-22:43:00] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0 [01/31/2022-22:43:00] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [I] trt-runner-N9-01/31/22-22:43:00
---- Inference Input(s) ---- {input [dtype=float16, shape=(1, 64, 9, 9)]} [I] trt-runner-N9-01/31/22-22:43:00
---- Inference Output(s) ---- {2 [dtype=float16, shape=(1, 1, 5, 5)]} [I] trt-runner-N9-01/31/22-22:43:00 | Completed 1 iteration(s) in 0.4537 ms | Average inference time: 0.4537 ms. [I] pytorch-runner-N9-01/31/22-22:43:00 | Activating and starting inference [I] pytorch-runner-N9-01/31/22-22:43:00 ---- Inference Input(s) ---- {input [dtype=float16, shape=(1, 64, 9, 9)]} [I] pytorch-runner-N9-01/31/22-22:43:00 ---- Inference Output(s) ---- {2 [dtype=float16, shape=(1, 1, 5, 5)]} [I] pytorch-runner-N9-01/31/22-22:43:00 | Completed 1 iteration(s) in 0.1144 ms | Average inference time: 0.1144 ms. [I] Accuracy Comparison | trt-runner-N9-01/31/22-22:43:00 vs. pytorch-runner-N9-01/31/22-22:43:00 [I] Comparing Output: '2' (dtype=float16, shape=(1, 1, 5, 5)) with '2' (dtype=float16, shape=(1, 1, 5, 5)) | Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error [I] trt-runner-N9-01/31/22-22:43:00: 2 | Stats: mean=-145.5, std-dev=inf, var=inf, median=-146.25, min=-203.62 at (0, 0, 1, 4), max=-34.656 at (0, 0, 4, 0), avg-magnitude=145.5 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (-204 , -187 ) | 12 | ######################################## (-187 , -170 ) | 0 | (-170 , -153 ) | 0 | (-153 , -136 ) | 2 | ###### (-136 , -119 ) | 2 | ###### (-119 , -102 ) | 0 | (-102 , -85.3) | 4 | ############# (-85.3, -68.4) | 4 | ############# (-68.4, -51.5) | 0 | (-51.5, -34.6) | 1 | ### [I] pytorch-runner-N9-01/31/22-22:43:00: 2 | Stats: mean=-145.5, std-dev=inf, var=inf, median=-146.38, min=-203.25 at (0, 0, 1, 4), max=-34.594 at (0, 0, 4, 0), avg-magnitude=145.5 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (-204 , -187 ) | 12 | ######################################## (-187 , -170 ) | 0 | (-170 , -153 ) | 0 | (-153 , -136 ) | 2 | ###### (-136 , -119 ) | 2 | ###### (-119 , -102 ) | 0 | (-102 , -85.3) | 4 | ############# (-85.3, -68.4) | 4 | ############# (-68.4, -51.5) | 0 | (-51.5, -34.6) | 1 | ### [I] Error Metrics: 2 [I] Minimum Required Tolerance: elemwise error | [abs=0.625] OR [rel=0.0032368] [I] Absolute Difference | Stats: mean=0.13745, std-dev=0.14368, var=0.02063, median=0.125, min=0 at (0, 0, 0, 2), max=0.625 at (0, 0, 3, 3), avg-magnitude=0.13745 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (0 , 0.0625) | 6 | ########################## (0.0625, 0.125 ) | 4 | ################# (0.125 , 0.188 ) | 9 | ######################################## (0.188 , 0.25 ) | 1 | #### (0.25 , 0.312 ) | 2 | ######## (0.312 , 0.375 ) | 0 | (0.375 , 0.438 ) | 2 | ######## (0.438 , 0.5 ) | 0 | (0.5 , 0.562 ) | 0 | (0.562 , 0.625 ) | 1 | #### [I] Relative Difference | Stats: mean=0.00091648, std-dev=0.000772, var=5.9605e-07, median=0.00079823, min=0 at (0, 0, 0, 2), max=0.0032368 at (0, 0, 3, 3), avg-magnitude=0.00091648 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (0 , 0.000324) | 6 | ################################## (0.000324, 0.000648) | 4 | ###################### (0.000648, 0.000971) | 7 | ######################################## (0.000971, 0.0013 ) | 1 | ##### (0.0013 , 0.00162 ) | 2 | ########### (0.00162 , 0.00194 ) | 4 | ###################### (0.00194 , 0.00227 ) | 0 | (0.00227 , 0.00259 ) | 0 | (0.00259 , 0.00291 ) | 0 | (0.00291 , 0.00324 ) | 1 | ##### [E] FAILED | Difference exceeds tolerance (rel=1e-05, abs=1e-05) [E] FAILED | Mismatched outputs: ['2']

Stride=1

[I] trt-runner-N0-01/31/22-22:43:13 | Activating and starting inference [I] Configuring with profiles: [Profile().add(input, min=[1, 64, 9, 9], opt=[1, 64, 9, 9], max=[1, 64, 9, 9])] [I] Building engine with configuration: Workspace | 200000000 bytes (190.73 MiB) Precision | TF32: False, FP16: True, INT8: False, Obey Precision Constraints: False, Strict Types: False Tactic Sources | ['CUBLAS', 'CUBLAS_LT', 'CUDNN'] Safety Restricted | False Profiles | 1 profile(s) [01/31/2022-22:43:14] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0 [01/31/2022-22:43:14] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [01/31/2022-22:43:15] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0 [01/31/2022-22:43:15] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [I] Finished engine building in 1.775 seconds [01/31/2022-22:43:15] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0 [01/31/2022-22:43:15] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [01/31/2022-22:43:15] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0 [01/31/2022-22:43:15] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [I] trt-runner-N0-01/31/22-22:43:13
---- Inference Input(s) ---- {input [dtype=float16, shape=(1, 64, 9, 9)]} [I] trt-runner-N0-01/31/22-22:43:13
---- Inference Output(s) ---- {2 [dtype=float16, shape=(1, 1, 9, 9)]} [I] trt-runner-N0-01/31/22-22:43:13 | Completed 1 iteration(s) in 0.5305 ms | Average inference time: 0.5305 ms. [I] pytorch-runner-N0-01/31/22-22:43:13 | Activating and starting inference [I] pytorch-runner-N0-01/31/22-22:43:13 ---- Inference Input(s) ---- {input [dtype=float16, shape=(1, 64, 9, 9)]} [I] pytorch-runner-N0-01/31/22-22:43:13 ---- Inference Output(s) ---- {2 [dtype=float16, shape=(1, 1, 9, 9)]} [I] pytorch-runner-N0-01/31/22-22:43:13 | Completed 1 iteration(s) in 0.1192 ms | Average inference time: 0.1192 ms. [I] Accuracy Comparison | trt-runner-N0-01/31/22-22:43:13 vs. pytorch-runner-N0-01/31/22-22:43:13 [I] Comparing Output: '2' (dtype=float16, shape=(1, 1, 9, 9)) with '2' (dtype=float16, shape=(1, 1, 9, 9)) | Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error [I] trt-runner-N0-01/31/22-22:43:13: 2 | Stats: mean=-138, std-dev=inf, var=inf, median=-144.5, min=-200.12 at (0, 0, 0, 7), max=-11.961 at (0, 0, 5, 8), avg-magnitude=138 [I] pytorch-runner-N0-01/31/22-22:43:13: 2 | Stats: mean=-138, std-dev=inf, var=inf, median=-144.5, min=-200.12 at (0, 0, 0, 7), max=-11.961 at (0, 0, 5, 8), avg-magnitude=138 [I] Error Metrics: 2 [I] Minimum Required Tolerance: elemwise error | [abs=0] OR [rel=0] [I] Absolute Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0, 0, 0), max=0 at (0, 0, 0, 0), avg-magnitude=0 [I] Relative Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0, 0, 0), max=0 at (0, 0, 0, 0), avg-magnitude=0 [I] PASSED | Difference is within tolerance (rel=1e-05, abs=1e-05) [I] PASSED | All outputs matched | Outputs: ['2']

schegde avatar Jan 31 '22 22:01 schegde

FYI, I also tried TRT vs ONNX-RT (onnxruntime-gpu v1.10.0), and again, same results as for TRT vs PYT :[

schegde avatar Feb 01 '22 00:02 schegde

@schegde did you solve it? There is no problem when padding=0 is used. Different results from PyTorch only when padding=1 or higher.

DonggeunYu avatar May 18 '22 05:05 DonggeunYu

@schegde Could you try TRT 8.4 and see if the issue still exists? If it does, we will debug it. Thanks

nvpohanh avatar Jul 01 '22 06:07 nvpohanh

same problem with TRT 8.4 and Cuda 10.2; stride = 2 with padding create different results with onnx and tensorRT;

Howell-Yang avatar Jul 25 '22 02:07 Howell-Yang

@DonggeunYu @nvpohanh Sorry guys :( , I am no longer working on this project, as I switched my company!

schegde avatar Aug 12 '22 00:08 schegde

closing issues that inactive for long time, thank all!

ttyio avatar Nov 23 '23 00:11 ttyio