TensorRT
TensorRT copied to clipboard
Conv2d stride=2 accuracy mismatch between PyTorch and TensorRT
Description
Pytorch Conv2d operation with stride=2 after ONNX conversion and TensorRT inference shows accuracy mismatch against PyTorch inference. This is not seen for stride=1. The conv2D signature is nn.Conv2d(64, 1, 3, stride=STRIDE, bias=False, padding=1)
Environment
TensorRT Version: 8.2.1.8
NVIDIA GPU: NVIDIA GeForce RTX 3080 Laptop GPU
NVIDIA Driver Version: 470.86
CUDA Version: 11.1
CUDNN Version: 8
Operating System: Ubuntu 20.04
Python Version (if applicable): 3.7.11
PyTorch Version (if applicable): 1.9.0+cu111
Baremetal or Container (if so, version): Docker image based off nvidia/cuda:11.1-cudnn8-devel-ubuntu20.04
Relevant Files
Steps To Reproduce
I am using ONNX-TensorRT to invoke TensorRT engine preparation and run as shown in the above script.I think this should be reproducible in TensorRT python API too. If this script is run for stride=1, then there are no mismatches seen between PyTorch and TensorRT for specified number of repeated runs. However stride=2 for given input channel size of 64, shows very low percentage of output element matches between PyTorch and TensorRT (10-50%). I am using FP16 here, but I tried FP32, even that doesn't help for this stride length.
Can you try exporting the pytorch model to onnx and check the accuracy with polygraphy?
reference: https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy/examples/cli/run/01_comparing_frameworks
Yes, I tried with polygraphy too, and I see the same type of mismatch for stride=2, and no mismatches for stride= 1!
Test file
Stride=2
[I] trt-runner-N9-01/31/22-22:43:00 | Activating and starting inference [I] Configuring with profiles: [Profile().add(input, min=[1, 64, 9, 9], opt=[1, 64, 9, 9], max=[1, 64, 9, 9])] [I] Building engine with configuration: Workspace | 200000000 bytes (190.73 MiB) Precision | TF32: False, FP16: True, INT8: False, Obey Precision Constraints: False, Strict Types: False Tactic Sources | ['CUBLAS', 'CUBLAS_LT', 'CUDNN'] Safety Restricted | False Profiles | 1 profile(s) [01/31/2022-22:43:00] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0 [01/31/2022-22:43:00] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [01/31/2022-22:43:00] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0 [01/31/2022-22:43:00] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [I] Finished engine building in 0.256 seconds [01/31/2022-22:43:00] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0 [01/31/2022-22:43:00] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [01/31/2022-22:43:00] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0 [01/31/2022-22:43:00] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [I] trt-runner-N9-01/31/22-22:43:00
---- Inference Input(s) ---- {input [dtype=float16, shape=(1, 64, 9, 9)]} [I] trt-runner-N9-01/31/22-22:43:00
---- Inference Output(s) ---- {2 [dtype=float16, shape=(1, 1, 5, 5)]} [I] trt-runner-N9-01/31/22-22:43:00 | Completed 1 iteration(s) in 0.4537 ms | Average inference time: 0.4537 ms. [I] pytorch-runner-N9-01/31/22-22:43:00 | Activating and starting inference [I] pytorch-runner-N9-01/31/22-22:43:00 ---- Inference Input(s) ---- {input [dtype=float16, shape=(1, 64, 9, 9)]} [I] pytorch-runner-N9-01/31/22-22:43:00 ---- Inference Output(s) ---- {2 [dtype=float16, shape=(1, 1, 5, 5)]} [I] pytorch-runner-N9-01/31/22-22:43:00 | Completed 1 iteration(s) in 0.1144 ms | Average inference time: 0.1144 ms. [I] Accuracy Comparison | trt-runner-N9-01/31/22-22:43:00 vs. pytorch-runner-N9-01/31/22-22:43:00 [I] Comparing Output: '2' (dtype=float16, shape=(1, 1, 5, 5)) with '2' (dtype=float16, shape=(1, 1, 5, 5)) | Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error [I] trt-runner-N9-01/31/22-22:43:00: 2 | Stats: mean=-145.5, std-dev=inf, var=inf, median=-146.25, min=-203.62 at (0, 0, 1, 4), max=-34.656 at (0, 0, 4, 0), avg-magnitude=145.5 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (-204 , -187 ) | 12 | ######################################## (-187 , -170 ) | 0 | (-170 , -153 ) | 0 | (-153 , -136 ) | 2 | ###### (-136 , -119 ) | 2 | ###### (-119 , -102 ) | 0 | (-102 , -85.3) | 4 | ############# (-85.3, -68.4) | 4 | ############# (-68.4, -51.5) | 0 | (-51.5, -34.6) | 1 | ### [I] pytorch-runner-N9-01/31/22-22:43:00: 2 | Stats: mean=-145.5, std-dev=inf, var=inf, median=-146.38, min=-203.25 at (0, 0, 1, 4), max=-34.594 at (0, 0, 4, 0), avg-magnitude=145.5 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (-204 , -187 ) | 12 | ######################################## (-187 , -170 ) | 0 | (-170 , -153 ) | 0 | (-153 , -136 ) | 2 | ###### (-136 , -119 ) | 2 | ###### (-119 , -102 ) | 0 | (-102 , -85.3) | 4 | ############# (-85.3, -68.4) | 4 | ############# (-68.4, -51.5) | 0 | (-51.5, -34.6) | 1 | ### [I] Error Metrics: 2 [I] Minimum Required Tolerance: elemwise error | [abs=0.625] OR [rel=0.0032368] [I] Absolute Difference | Stats: mean=0.13745, std-dev=0.14368, var=0.02063, median=0.125, min=0 at (0, 0, 0, 2), max=0.625 at (0, 0, 3, 3), avg-magnitude=0.13745 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (0 , 0.0625) | 6 | ########################## (0.0625, 0.125 ) | 4 | ################# (0.125 , 0.188 ) | 9 | ######################################## (0.188 , 0.25 ) | 1 | #### (0.25 , 0.312 ) | 2 | ######## (0.312 , 0.375 ) | 0 | (0.375 , 0.438 ) | 2 | ######## (0.438 , 0.5 ) | 0 | (0.5 , 0.562 ) | 0 | (0.562 , 0.625 ) | 1 | #### [I] Relative Difference | Stats: mean=0.00091648, std-dev=0.000772, var=5.9605e-07, median=0.00079823, min=0 at (0, 0, 0, 2), max=0.0032368 at (0, 0, 3, 3), avg-magnitude=0.00091648 [I] ---- Histogram ---- Bin Range | Num Elems | Visualization (0 , 0.000324) | 6 | ################################## (0.000324, 0.000648) | 4 | ###################### (0.000648, 0.000971) | 7 | ######################################## (0.000971, 0.0013 ) | 1 | ##### (0.0013 , 0.00162 ) | 2 | ########### (0.00162 , 0.00194 ) | 4 | ###################### (0.00194 , 0.00227 ) | 0 | (0.00227 , 0.00259 ) | 0 | (0.00259 , 0.00291 ) | 0 | (0.00291 , 0.00324 ) | 1 | ##### [E] FAILED | Difference exceeds tolerance (rel=1e-05, abs=1e-05) [E] FAILED | Mismatched outputs: ['2']
Stride=1
[I] trt-runner-N0-01/31/22-22:43:13 | Activating and starting inference [I] Configuring with profiles: [Profile().add(input, min=[1, 64, 9, 9], opt=[1, 64, 9, 9], max=[1, 64, 9, 9])] [I] Building engine with configuration: Workspace | 200000000 bytes (190.73 MiB) Precision | TF32: False, FP16: True, INT8: False, Obey Precision Constraints: False, Strict Types: False Tactic Sources | ['CUBLAS', 'CUBLAS_LT', 'CUDNN'] Safety Restricted | False Profiles | 1 profile(s) [01/31/2022-22:43:14] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0 [01/31/2022-22:43:14] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [01/31/2022-22:43:15] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0 [01/31/2022-22:43:15] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [I] Finished engine building in 1.775 seconds [01/31/2022-22:43:15] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0 [01/31/2022-22:43:15] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [01/31/2022-22:43:15] [TRT] [W] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.3 but loaded cuBLAS/cuBLAS LT 11.3.0 [01/31/2022-22:43:15] [TRT] [W] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [I] trt-runner-N0-01/31/22-22:43:13
---- Inference Input(s) ---- {input [dtype=float16, shape=(1, 64, 9, 9)]} [I] trt-runner-N0-01/31/22-22:43:13
---- Inference Output(s) ---- {2 [dtype=float16, shape=(1, 1, 9, 9)]} [I] trt-runner-N0-01/31/22-22:43:13 | Completed 1 iteration(s) in 0.5305 ms | Average inference time: 0.5305 ms. [I] pytorch-runner-N0-01/31/22-22:43:13 | Activating and starting inference [I] pytorch-runner-N0-01/31/22-22:43:13 ---- Inference Input(s) ---- {input [dtype=float16, shape=(1, 64, 9, 9)]} [I] pytorch-runner-N0-01/31/22-22:43:13 ---- Inference Output(s) ---- {2 [dtype=float16, shape=(1, 1, 9, 9)]} [I] pytorch-runner-N0-01/31/22-22:43:13 | Completed 1 iteration(s) in 0.1192 ms | Average inference time: 0.1192 ms. [I] Accuracy Comparison | trt-runner-N0-01/31/22-22:43:13 vs. pytorch-runner-N0-01/31/22-22:43:13 [I] Comparing Output: '2' (dtype=float16, shape=(1, 1, 9, 9)) with '2' (dtype=float16, shape=(1, 1, 9, 9)) | Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error [I] trt-runner-N0-01/31/22-22:43:13: 2 | Stats: mean=-138, std-dev=inf, var=inf, median=-144.5, min=-200.12 at (0, 0, 0, 7), max=-11.961 at (0, 0, 5, 8), avg-magnitude=138 [I] pytorch-runner-N0-01/31/22-22:43:13: 2 | Stats: mean=-138, std-dev=inf, var=inf, median=-144.5, min=-200.12 at (0, 0, 0, 7), max=-11.961 at (0, 0, 5, 8), avg-magnitude=138 [I] Error Metrics: 2 [I] Minimum Required Tolerance: elemwise error | [abs=0] OR [rel=0] [I] Absolute Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0, 0, 0), max=0 at (0, 0, 0, 0), avg-magnitude=0 [I] Relative Difference | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0, 0, 0), max=0 at (0, 0, 0, 0), avg-magnitude=0 [I] PASSED | Difference is within tolerance (rel=1e-05, abs=1e-05) [I] PASSED | All outputs matched | Outputs: ['2']
FYI, I also tried TRT vs ONNX-RT (onnxruntime-gpu v1.10.0), and again, same results as for TRT vs PYT :[
@schegde did you solve it? There is no problem when padding=0 is used. Different results from PyTorch only when padding=1 or higher.
@schegde Could you try TRT 8.4 and see if the issue still exists? If it does, we will debug it. Thanks
same problem with TRT 8.4 and Cuda 10.2; stride = 2 with padding create different results with onnx and tensorRT;
@DonggeunYu @nvpohanh Sorry guys :( , I am no longer working on this project, as I switched my company!
closing issues that inactive for long time, thank all!