TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

❓ [Question] How to solve this warning: Detected this engine is being instantitated in a multi-GPU system with multi-device safe mode disabled.

Open demuxin opened this issue 1 year ago • 6 comments

❓ Question

I used Torch-TensorRT to compile the torchscript model in C++. When compiling or loading torchtrt model, it displays many warnings.

WARNING: [Torch-TensorRT] - Detected this engine is being instantitated in a multi-GPU system with multi-device safe mode disabled. For more on the implications of this as well as workarounds, see the linked documentation (https://pytorch.org/TensorRT/user_guide/runtime.html#multi-device-safe-mode)
WARNING: [Torch-TensorRT] - Detected this engine is being instantitated in a multi-GPU system with multi-device safe mode disabled. For more on the implications of this as well as workarounds, see the linked documentation (https://pytorch.org/TensorRT/user_guide/runtime.html#multi-device-safe-mode)
WARNING: [Torch-TensorRT] - Detected this engine is being instantitated in a multi-GPU system with multi-device safe mode disabled. For more on the implications of this as well as workarounds, see the linked documentation (https://pytorch.org/TensorRT/user_guide/runtime.html#multi-device-safe-mode)

What you have already tried

I found this link is useful, but it only provides Python API.

I checked the source code, but I still haven't figured out how to set up MULTI_DEVICE_SAFE_MODE in C++.

What can I do to address this warning?

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • PyTorch Version (e.g., 1.0):
  • CPU Architecture: x86
  • OS (e.g., Linux): ubuntu18
  • How you installed PyTorch (conda, pip, libtorch, source): libtorch
  • Build command you used (if compiling from source):
  • Are you using local sources or building from archives:
  • Python version:
  • CUDA version: 12.2
  • GPU models and configuration: 1080Ti
  • Any other relevant information:

demuxin avatar May 06 '24 09:05 demuxin

Moreover, when the model infers using forward function, there are these warnings:

WARNING: [Torch-TensorRT] - Using default stream in enqueue()/enqueueV2()/enqueueV3() may lead to performance issues due to additional cudaDeviceSynchronize() calls by TensorRT to ensure correct synchronizations. Please use non-default stream instead.
WARNING: [Torch-TensorRT] - Using default stream in enqueue()/enqueueV2()/enqueueV3() may lead to performance issues due to additional cudaDeviceSynchronize() calls by TensorRT to ensure correct synchronizations. Please use non-default stream instead.
WARNING: [Torch-TensorRT] - Using default stream in enqueue()/enqueueV2()/enqueueV3() may lead to performance issues due to additional cudaDeviceSynchronize() calls by TensorRT to ensure correct synchronizations. Please use non-default stream instead.

demuxin avatar May 06 '24 10:05 demuxin

Hi @demuxin - thanks for the report - we likely need to add getter/setter methods to toggle this value in C++ as well: https://github.com/pytorch/TensorRT/blob/07c5b07b5ca81165682be90049a6bc67cff9928e/core/runtime/runtime.cpp#L10 Similar to the following functions: https://github.com/pytorch/TensorRT/blob/07c5b07b5ca81165682be90049a6bc67cff9928e/core/runtime/register_jit_hooks.cpp#L121-L122 We are aware of the default stream warning and are working on this. It should not have a substantial effect on inference from what I've seen

gs-olive avatar May 06 '24 19:05 gs-olive

These may actually already be accessible in C++, prior to inference, could you try adding the line: torch_tensorrt::core::runtime::set_multi_device_safe_mode(true)

gs-olive avatar May 06 '24 19:05 gs-olive

Hi @gs-olive, this is not right, there is complie error:

error: ‘set_multi_device_safe_mode’ is not a member of ‘torch_tensorrt::core::runtime’
   27 |         torch_tensorrt::core::runtime::set_multi_device_safe_mode(true);

demuxin avatar May 07 '24 01:05 demuxin

Does torch::ops::tensorrt::get_multi_device_safe_mode() exist, or does this also cause a compilation error?

gs-olive avatar May 08 '24 21:05 gs-olive

This is a similar compilation error.

error: ‘torch::ops’ has not been declared
   24 |         torch::ops::tensorrt::get_multi_device_safe_mode(true);

I searched the libtorch and torch-TensorRT header files, and there are no functions related to multi_device_safe_mode.

demuxin avatar May 09 '24 01:05 demuxin