TensorRT ❓ [Question] Running a same torchscript using the same input producing different results.

❓ Question

I'm trying to run a pretrained resnet50 model from torch.torchvision.models. enabled_precisions is set to torch.half. Each time I load the same resnet50 torchscript, using the same input（which is set to zero using np.zeros）. But after running serveral times I've found the output is not stable.

What you have already tried

I've tried two ways:

Load the same resetnet50 torchscript and compile it, the do the inference. The output is not stable.
Save the compiled script, load it each time and to the inference. The output is stable.

I wonder whether there's some random behaviors in torch_tensorrt.compile() when enabled_precisions is set to torch.half.

Environment

PyTorch Version : 1.10
CPU Architecture: x86_64
OS (e.g., Linux): Linux
How you installed PyTorch (conda, pip, libtorch, source): pip
Build command you used (if compiling from source): installed via pip3 install torch-tensorrt -f https://github.com/NVIDIA/Torch-TensorRT/releases
Are you using local sources or building from archives:
Python version: 3.6.9
CUDA version: 11.4
GPU models and configuration: pretrained resnet50 model from torch.torchvision.models
Any other relevant information: Torch-TensorRT version: v1.0

Additional context

The python code producing unstable result is as below:

from torchvision import models
import numpy as np
import torch
import torch_tensorrt
import time

input = np.zeros((1, 3, 224, 224)).astype(np.float32)
input = torch.from_numpy(input).cuda()

torch_script_module = torch.jit.load('torch_script_module.ts')

trt_ts_module = torch_tensorrt.compile(torch_script_module,
    inputs=[
        torch_tensorrt.Input(  # Specify input object with shape and dtype
            min_shape=[1, 3, 224, 224],
            opt_shape=[1, 3, 224, 224],
            max_shape=[1, 3, 224, 224],
            # For static size shape=[1, 3, 224, 224]
            dtype=torch.float32)  # Datatype of input tensor. Allowed options torch.(float|half|int8|int32|bool)
    ],
    enabled_precisions={torch.half},)  # Run with FP16)

result=trt_ts_module(input)  # run inference

t1 = time.time()
for i in range(1000):
    result=trt_ts_module(input)  # run inference
t2 = time.time()
print('result', result[0][0])
print("Cost: ", round(t2-t1, 4))

Two iterations produce different outputs: Iteration 1:

WARNING: [Torch-TensorRT] - Dilation not used in Max pooling converter
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Detected invalid timing cache, setup a local cache instead
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0
WARNING: [Torch-TensorRT] - TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
WARNING: [Torch-TensorRT] - TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0
WARNING: [Torch-TensorRT] - TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
WARNING: [Torch-TensorRT] - TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0
result tensor(-0.4390, device='cuda:0')
Cost:  1.3429

Iteration 2:

WARNING: [Torch-TensorRT] - Dilation not used in Max pooling converter
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Detected invalid timing cache, setup a local cache instead
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0
WARNING: [Torch-TensorRT] - TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
WARNING: [Torch-TensorRT] - TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0
WARNING: [Torch-TensorRT] - TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.4.2
WARNING: [Torch-TensorRT] - TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.2.0
result tensor(-0.4463, device='cuda:0')
Cost:  1.3206

Feb 10 '22 12:02 SeTriones

TensorRT performs "kernel auto-tuning" which essentially selects the fastest kernels for your models on your specific device. There can be a small amount of jitter in this step, for a variety of reasons, leading to different kernels being selected & thus different perf.

You can check the kernels are in fact different to confirm this.

Also, this looks like ~1.5% perf jitter for your model. Is this an issue in your application, or is this just out of curiosity? Have you seen larger variance between runs?

Feb 18 '22 18:02 ncomly-nvidia

@ncomly-nvidia this is just for curiosity. I'm doing more experiments on the following model architectures:

efficientnet-b2 vit yolov5s(v6.0) yolov5m(v6.0) yolov5x(v6.0) tsm( batch 16) SwinTransformer3D bert transformer

Feb 19 '22 01:02 SeTriones

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

May 21 '22 00:05 github-actions[bot]

Hi @SeTriones, how have your other experiments gone? Is there other discrepancies in results or performance in the models you listed above which create concern for you?

May 31 '22 17:05 ncomly-nvidia

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Aug 30 '22 00:08 github-actions[bot]

TensorRT TensorRT copied to clipboard

❓ [Question] Running a same torchscript using the same input producing different results.

❓ Question

What you have already tried

Environment

Additional context

TensorRT
TensorRT copied to clipboard