TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

❓ [Question] Speed problem about TRTorch and Torch-TensorRT - Device Compatibility Check

Open yuezhuang1387 opened this issue 3 years ago • 2 comments

Question1

I find the Torchscript model optimized by TRTorch 0.2.0 faster than TensorRT model(All models are Python API),such as common ResNet series, RapVGG series models and so on, shouldn't the TensorRT model be the fastest? I want to know why does this happen.

Torchscript model(optimized by TRTorch 0.2.0):

  • torch 1.7.1+cu110
  • trtorch 0.2.0
  • TensorRT 7.2
  • cuDNN 8.2
  • GPU:Tesla T4
  • CentOS Linux release 7.6.1810 (Core)

TensorRT model(.trt):

  • torch 1.7.1+cu110
  • tensorrt 8.2.0.6
  • cuDNN 8.2
  • GPU:Tesla T4 -CentOS Linux release 7.6.1810 (Core)

Question2

I found the inference speed of TorchScript model is different after using different versions of Torch-TensoRT(TRTorch) to optimize with the same structure. For the same structure ResNet series model, TorchScript model(optimized by TRTorch 0.2.0,torch 1.7.1-cu110,TensorRT 7.2 and cuDNN 8.2) is faster than TorchScript model(optimized by Torch-TensorRT 1.0.0,torch 1.10.1-cu113,TensorRT 8.0 and cuDNN 8.2), shouldn't the latest Torch-TensorRT 1.0.0 be faster ?I'm also very confused.

  • GPU:Tesla T4
  • CentOS Linux release 7.6.1810 (Core)
  • input shape: (1,3,224,224) Here are some of my test results. image

yuezhuang1387 avatar Feb 08 '22 06:02 yuezhuang1387

Between these two versions there was a constant time operation that was added to check compatibility of the current device with the compiled model. This is likely the overhead you are experiencing.

We are investigating if this can be mitigated for future inferences once the model is loaded.

ncomly-nvidia avatar Feb 22 '22 16:02 ncomly-nvidia

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

github-actions[bot] avatar Aug 17 '22 00:08 github-actions[bot]

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

github-actions[bot] avatar Nov 21 '22 00:11 github-actions[bot]

[Removed]

Christina-Young-NVIDIA avatar Dec 20 '22 02:12 Christina-Young-NVIDIA

This device check cannot currently be mitigated safely. We are investigating options in TRT to reduce this overhead.

ncomly-nvidia avatar Jan 03 '23 21:01 ncomly-nvidia

Explore using https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaPointerAttributes.html#structcudaPointerAttributes to query data locations and assume current device is correct?

narendasan avatar Mar 23 '23 07:03 narendasan

https://developer.nvidia.com/blog/cuda-pro-tip-the-fast-way-to-query-device-properties/

This check was added that likely caused perf issue: https://github.com/pytorch/TensorRT/blob/bf4474dc7816c184489d3985ce892315f5e0cc42/core/runtime/runtime.cpp#L81

This check invokes a constructor for a TensorRT wrapper object RTDevice::RTDevice https://github.com/pytorch/TensorRT/blob/bf4474dc7816c184489d3985ce892315f5e0cc42/core/runtime/RTDevice.cpp#L16

And this is invoking cudaGetDeviceProperties which is expensive, but the above article may be used to mitigate the issue.

laikhtewari avatar Jun 20 '23 22:06 laikhtewari

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

github-actions[bot] avatar Sep 19 '23 00:09 github-actions[bot]