TensorRT 🐛 [Bug] RuntimeError: [Error thrown at core/runtime/execute_engine.cpp:132] Expected inputs[i].is_cuda() to be true but got false Expected input tensors to have device cuda, found device cpu

Bug Description

The code below produces the following error:

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: [Error thrown at core/runtime/execute_engine.cpp:132] Expected inputs[i].is_cuda() to be true but got false
Expected input tensors to have device cuda, found device cpu

This same code works fine with Torch-TensorRT 1.4.0. When using the Dynamo backend, I get the following error:

Unsupported: dynamic shape operator: aten.masked_select.default

To Reproduce

import torch
import torch_tensorrt

from torch import nn

DEVICE = "cuda:0"


class Indexer(nn.Module):
    def __init__(self, side_cells):
        super().__init__()
        self.side_cells = side_cells

    def forward(self, pn_feats, pillar_pixels):
        (N, P, C) = pn_feats.shape
        pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(pn_feats)
        batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels)
        rows = pillar_pixels[..., 0].flatten()
        cols = pillar_pixels[..., 1].flatten()
        mask = rows != -1
        batch_idxs = torch.masked_select(batch_idxs, mask)
        rows = torch.masked_select(rows, mask)
        cols = torch.masked_select(cols, mask)
        pn_feats = torch.masked_select(pn_feats.reshape(-1, C), mask[:, None])
        pn_feats = pn_feats.reshape(len(rows), C)
        pseudo_images[batch_idxs, rows, cols] = pn_feats
        return pseudo_images


def main():
    side_cells = 200
    pn_feats = torch.rand((1, 12000, 64)).to(DEVICE)
    pillar_pixels = torch.randint(0, side_cells, (1, 12000, 2)).to(DEVICE)
    pillar_pixels[0, 800:] = -1

    model = Indexer(side_cells).to(DEVICE)
    model.eval()
    with torch.no_grad():
        pt_preds = model(pn_feats, pillar_pixels)

    inputs = [
        torch_tensorrt.Input(pn_feats.shape),
        torch_tensorrt.Input(pillar_pixels.shape, dtype=torch.int32),
    ]
    enabled_precisions = {torch.half, torch.float32}
    trt_model = torch_tensorrt.compile(
        model,
        inputs=inputs,
        enabled_precisions=enabled_precisions,
        truncate_long_and_double=True,
        ir="torchscript",
    )
    trt_preds = trt_model(pn_feats, pillar_pixels.int())


if __name__ == "__main__":
    main()

Expected behavior

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

Torch-TensorRT Version (e.g. 1.0.0): 2.0.0dev and 2.2.0
PyTorch Version (e.g. 1.0): 2.2
CPU Architecture: i7-12800H
OS (e.g., Linux): Linux
How you installed PyTorch (conda, pip, libtorch, source): nvcr.io/nvidia/pytorch:24.01-py3
Build command you used (if compiling from source):
Are you using local sources or building from archives:
Python version: 3.10.12
CUDA version: 12.2
GPU models and configuration: GeForce RTX 3080 Ti
Any other relevant information:

Additional context

Apr 11 '24 13:04 airalcorn2

Hi - thanks for the report - I am able to reproduce the issue. For a quick workaround, try one of the following replacements:

pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(pn_feats)
batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels)


##### Replace the above with:


pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).cuda()
batch_idxs = torch.arange(N).repeat_interleave(P).cuda()

##### or

pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C, device=pn_feats.device)
batch_idxs = torch.arange(N, device=pillar_pixels.device).repeat_interleave(P)

##### or 

pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(pn_feats.device)
batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels.device)

It seems that the .to with the tensor itself as input is not being interpreted correctly here.

With respect to the Dynamo path, I have added #2747 to add support for aten.masked_select.default. Does the model successfully compile with fallback (running aten.masked_select.default in Torch) when using ir="dynamo"?

Apr 12 '24 02:04 gs-olive

Thanks for the workarounds, @gs-olive! I went with:

pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(pn_feats.device)
batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels.device)

for consistency's sake and that worked for me.

For the Dynamo path, I tried enabling fallback with:

trt_model = torch_tensorrt.compile(
    model,
    inputs=inputs,
    enabled_precisions=enabled_precisions,
    truncate_long_and_double=True,
    torch_executed_ops=["aten::masked_select"],
)

but was getting the same error. However, I just noticed there's actually an earlier error raised before the second error:

DynamicOutputShapeException: aten.masked_select.default

The above exception was the direct cause of the following exception:

Apr 12 '24 13:04 airalcorn2

That newly discovered error in the Dynamo path led me to this issue and this issue, which was fixed here according to the comments.

Apr 12 '24 13:04 airalcorn2

Interestingly, this code, which just uses normal boolean indexing, seems to work with the Dynamo path:

import torch
import torch_tensorrt

from torch import nn

DEVICE = "cuda:0"


class Indexer(nn.Module):
    def __init__(self, side_cells):
        super().__init__()
        self.side_cells = side_cells

    def forward(self, pn_feats, pillar_pixels):
        (N, P, C) = pn_feats.shape
        pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(
            pn_feats.device
        )
        batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels.device)
        rows = pillar_pixels[..., 0].flatten()
        cols = pillar_pixels[..., 1].flatten()
        mask = rows != -1
        batch_idxs = batch_idxs[mask]
        rows = rows[mask]
        cols = cols[mask]
        pn_feats = pn_feats.reshape(-1, C)[mask]
        pseudo_images[batch_idxs, rows, cols] = pn_feats
        return pseudo_images


def main():
    side_cells = 200
    pn_feats = torch.rand((1, 12000, 64)).to(DEVICE)
    pillar_pixels = torch.randint(0, side_cells, (1, 12000, 2)).to(DEVICE)
    pillar_pixels[0, 800:] = -1

    model = Indexer(side_cells).to(DEVICE)
    model.eval()
    with torch.no_grad():
        pt_preds = model(pn_feats, pillar_pixels)

    inputs = [
        torch_tensorrt.Input(pn_feats.shape),
        torch_tensorrt.Input(pillar_pixels.shape, dtype=torch.int32),
    ]
    enabled_precisions = {torch.half, torch.float32}
    trt_model = torch_tensorrt.compile(
        model,
        inputs=inputs,
        enabled_precisions=enabled_precisions,
        truncate_long_and_double=True,
        min_block_size=1,
    )
    trt_preds = trt_model(pn_feats, pillar_pixels.int())
    print((pt_preds == trt_preds[0]).sum())


if __name__ == "__main__":
    main()

Apr 12 '24 13:04 airalcorn2

When using PyTorch 2.3.1 and Torch-TensorRT 2.3.0, I still get the same error for the original code (i.e., with ir="torchscript"), but when using the Dynamo backend or output_format="torchscript", I don't get an error and trt_preds == pt_preds.

Jun 11 '24 19:06 airalcorn2

TensorRT TensorRT copied to clipboard

🐛 [Bug] RuntimeError: [Error thrown at core/runtime/execute_engine.cpp:132] Expected inputs[i].is_cuda() to be true but got false Expected input tensors to have device cuda, found device cpu

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

TensorRT
TensorRT copied to clipboard