TensorRT ❓ [Question] Speed problem about TRTorch and Torch-TensorRT

Question1

I find the Torchscript model optimized by TRTorch 0.2.0 faster than TensorRT model(All models are Python API)，such as common ResNet series, RapVGG series models and so on, shouldn't the TensorRT model be the fastest? I want to know why does this happen.

Torchscript model(optimized by TRTorch 0.2.0):

torch 1.7.1+cu110
trtorch 0.2.0
TensorRT 7.2
cuDNN 8.2
GPU:Tesla T4
CentOS Linux release 7.6.1810 (Core)

TensorRT model(.trt):

torch 1.7.1+cu110
tensorrt 8.2.0.6
cuDNN 8.2
GPU:Tesla T4 -CentOS Linux release 7.6.1810 (Core)

Question2

I found the inference speed of TorchScript model is different after using different versions of Torch-TensoRT(TRTorch) to optimize with the same structure. For the same structure ResNet series model, TorchScript model(optimized by TRTorch 0.2.0,torch 1.7.1-cu110,TensorRT 7.2 and cuDNN 8.2) is faster than TorchScript model(optimized by Torch-TensorRT 1.0.0,torch 1.10.1-cu113,TensorRT 8.0 and cuDNN 8.2), shouldn't the latest Torch-TensorRT 1.0.0 be faster ？I'm also very confused.

GPU:Tesla T4
CentOS Linux release 7.6.1810 (Core)
input shape: (1,3,224,224) Here are some of my test results.

Feb 08 '22 06:02 yuezhuang1387

Between these two versions there was a constant time operation that was added to check compatibility of the current device with the compiled model. This is likely the overhead you are experiencing.

We are investigating if this can be mitigated for future inferences once the model is loaded.

Feb 22 '22 16:02 ncomly-nvidia

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Aug 17 '22 00:08 github-actions[bot]

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Nov 21 '22 00:11 github-actions[bot]

[Removed]

Dec 20 '22 02:12 Christina-Young-NVIDIA

This device check cannot currently be mitigated safely. We are investigating options in TRT to reduce this overhead.

Jan 03 '23 21:01 ncomly-nvidia

Explore using https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaPointerAttributes.html#structcudaPointerAttributes to query data locations and assume current device is correct?

Mar 23 '23 07:03 narendasan

https://developer.nvidia.com/blog/cuda-pro-tip-the-fast-way-to-query-device-properties/

This check was added that likely caused perf issue: https://github.com/pytorch/TensorRT/blob/bf4474dc7816c184489d3985ce892315f5e0cc42/core/runtime/runtime.cpp#L81

This check invokes a constructor for a TensorRT wrapper object RTDevice::RTDevice https://github.com/pytorch/TensorRT/blob/bf4474dc7816c184489d3985ce892315f5e0cc42/core/runtime/RTDevice.cpp#L16

And this is invoking cudaGetDeviceProperties which is expensive, but the above article may be used to mitigate the issue.

Jun 20 '23 22:06 laikhtewari

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Sep 19 '23 00:09 github-actions[bot]

TensorRT
TensorRT copied to clipboard

❓ [Question] Speed problem about TRTorch and Torch-TensorRT - Device Compatibility Check

Question1

Question2

https://developer.nvidia.com/blog/cuda-pro-tip-the-fast-way-to-query-device-properties/

TensorRT TensorRT copied to clipboard

❓ [Question] Speed problem about TRTorch and Torch-TensorRT - Device Compatibility Check

Question1

Question2

https://developer.nvidia.com/blog/cuda-pro-tip-the-fast-way-to-query-device-properties/

TensorRT
TensorRT copied to clipboard