TensorRT 🐛 [Bug] Loading Torch-TensorRT models (.ts) on multiple GPUs (in TorchServe)

Bug Description

Everything works well when I'm using 1 GPU, but as soon as I try to load a model on 4 separate GPUs, I get this error:

MODEL_LOG - RuntimeError: [Error thrown at core/runtime/TRTEngine.cpp:42] Expected most_compatible_device to be true but got false MODEL_LOG - No compatible device was found for instantiating TensorRT engine

To Reproduce

Steps to reproduce the behavior:

Create a (.ts) model and load it on 4 different GPUs. I don't know if this is specific to TorchServe, or a general issue.

Here's the simple version (TorchServe Handler):

def initialize(self, ctx):
        properties = ctx.system_properties
        self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")
        self.model = torch.jit.load('model.ts')

I'm not sure if it relates to this issue. From what I can tell it seems like I need to restrict the CUDA context, however, the GPU is assigned in the handler. I tried these things, but it's still giving me the same problem.

def initialize(self, ctx):
        properties = ctx.system_properties
        self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")
        torch.cuda.set_device(self.device)
        torch_tensorrt.set_device(int(properties.get("gpu_id")))

        with torch.cuda.device(int(properties.get("gpu_id"))):
              self.model = torch.jit.load('model.ts')
              self.model.to(self.device)
              self.model.eval()

I also tried mapping the model straight to the GPU on load, but with the same problem.

Expected behavior

Load a .ts model by specifying the GPU Id without any issues.

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

Official PyTorch image: nvcr.io/nvidia/pytorch:22.12-py3 GPUs: 4x NVIDIA A10G Pytorch: 1.14.0a0+410ce96 NVIDIA CUDA 11.8.0 TensorRT 8.5.1 Ubuntu 20.04 including Python 3.8 NVIDIA CUDA® 11.8.0 NVIDIA cuBLAS 11.11.3.6 NVIDIA cuDNN 8.7.0.84 NVIDIA NCCL 2.15.5 (optimized for NVIDIA NVLink®) NVIDIA RAPIDS™ 22.10.01 (For x86, only these libraries are included: cudf, xgboost, rmm, cuml, and cugraph.) Apex rdma-core 36.0 NVIDIA HPC-X 2.13 OpenMPI 4.1.4+ GDRCopy 2.3 TensorBoard 2.9.0 Nsight Compute 2022.3.0.0 Nsight Systems 2022.4.2.1 NVIDIA TensorRT™ 8.5.1 Torch-TensorRT 1.1.0a0 NVIDIA DALI® 1.20.0 MAGMA 2.6.2 JupyterLab 2.3.2 including Jupyter-TensorBoard TransformerEngine 0.3.0

Additional context

May 05 '23 10:05 emilwallner

@gs-olive can you try to replicate this?

May 08 '23 16:05 narendasan

Hello - I tried the following minimal example to reproduce the error:

Compile resnet18 on GPU 0
Load two instances of the same saved model (one on GPU 0, another on GPU 1 which is the same type)
Run inference with both

While I was unable to reproduce the exact error as described, I did notice that the compiled model would only return results stored on GPU 0 (the GPU index which it was compiled with), and not other GPUs of the same type with other indices. This is an issue on our end, which I am looking into. Based on this, it might make sense to try recompiling the model for each unique GPU ID, and saving the models as "model_gpu0.ts", "model_gpu1.ts",..., as a temporary workaround, and to see if this resolves the issue.

I will also continue trying to reproduce the Expected most_compatible_device to be true but got false error.

May 09 '23 17:05 gs-olive

Very much appreciate you looking into this and thanks for the suggested workaround! 🙌

May 10 '23 11:05 emilwallner

I met the same issue! With the same env, using nvidia A100 create the model, then loading it on nvidia 3090, the error ‘’RuntimeError: [Error thrown at core/runtime/TRTEngine.cpp:42] Expected most_compatible_device to be true but got false No compatible device was found for instantiating TensorRT engine‘’
came up.

Both A100 and 3090 have the same Ampere architecture.

May 17 '23 02:05 NothingToSay99

For further context, I used the same docker image (nvcr.io/nvidia/pytorch:22.12-py3) to compile and run the model, but it was compiled on Ampere RTX A6000 and run on A10. As mentioned earlier, it worked well with one GPU, but not with a multi-gpu configuration.

May 17 '23 14:05 emilwallner

Thank you both for the follow-up. After corresponding with @narendasan on this, the reason for which compiling the model on A100 and instantiating on 3090 is an issue is due to the difference in compute capability (A100 having Compute Capability 8.0 and 3090 having Compute Capability 8.6, source).

As of TensorRT 8.6, there is a newly added support for Hardware Compatibility, which should resolve this issue once we add support for the feature in Torch-TensorRT. There is a feature request already filed for this: #1929.

May 17 '23 21:05 gs-olive

Thanks for your reply, looking forward to your work!

May 18 '23 08:05 NothingToSay99

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Aug 17 '23 00:08 github-actions[bot]

Hello - as an update on this issue, we recently added #2325 to main which addresses compilation of the model on one GPU and loading on a different (or multiple) GPUs of the same kind. This PR was intended to fix cases where the model would always load to GPU 0. The feature to add hardware compatibility support (build on one GPU, functional on a variety) is still planned for implementation in #1929.

Oct 24 '23 20:10 gs-olive

Excellent, thanks for the hard work and update!

Oct 25 '23 12:10 emilwallner

Hello - we recently added #2445 which enables the hardware_compatibility feature for TRT Engines generated with ir="torch_compile" or ir="dynamo". If you are able to test out multi-GPU usage with hardware_compatible=True and ir="dynamo" (which also allows serialization via TorchScript), it would be much appreciated

Jan 12 '24 22:01 gs-olive

Thanks @gs-olive!! I'm currently low on bandwidth, but I'll give this a spin for my next model!

Jan 14 '24 10:01 emilwallner