TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

🐛 [Bug] RuntimeError: [Error thrown at core/runtime/execute_engine.cpp:132] Expected inputs[i].is_cuda() to be true but got false Expected input tensors to have device cuda, found device cpu

Open airalcorn2 opened this issue 1 year ago • 5 comments

Bug Description

The code below produces the following error:

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: [Error thrown at core/runtime/execute_engine.cpp:132] Expected inputs[i].is_cuda() to be true but got false
Expected input tensors to have device cuda, found device cpu

This same code works fine with Torch-TensorRT 1.4.0. When using the Dynamo backend, I get the following error:

Unsupported: dynamic shape operator: aten.masked_select.default

To Reproduce

import torch
import torch_tensorrt

from torch import nn

DEVICE = "cuda:0"


class Indexer(nn.Module):
    def __init__(self, side_cells):
        super().__init__()
        self.side_cells = side_cells

    def forward(self, pn_feats, pillar_pixels):
        (N, P, C) = pn_feats.shape
        pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(pn_feats)
        batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels)
        rows = pillar_pixels[..., 0].flatten()
        cols = pillar_pixels[..., 1].flatten()
        mask = rows != -1
        batch_idxs = torch.masked_select(batch_idxs, mask)
        rows = torch.masked_select(rows, mask)
        cols = torch.masked_select(cols, mask)
        pn_feats = torch.masked_select(pn_feats.reshape(-1, C), mask[:, None])
        pn_feats = pn_feats.reshape(len(rows), C)
        pseudo_images[batch_idxs, rows, cols] = pn_feats
        return pseudo_images


def main():
    side_cells = 200
    pn_feats = torch.rand((1, 12000, 64)).to(DEVICE)
    pillar_pixels = torch.randint(0, side_cells, (1, 12000, 2)).to(DEVICE)
    pillar_pixels[0, 800:] = -1

    model = Indexer(side_cells).to(DEVICE)
    model.eval()
    with torch.no_grad():
        pt_preds = model(pn_feats, pillar_pixels)

    inputs = [
        torch_tensorrt.Input(pn_feats.shape),
        torch_tensorrt.Input(pillar_pixels.shape, dtype=torch.int32),
    ]
    enabled_precisions = {torch.half, torch.float32}
    trt_model = torch_tensorrt.compile(
        model,
        inputs=inputs,
        enabled_precisions=enabled_precisions,
        truncate_long_and_double=True,
        ir="torchscript",
    )
    trt_preds = trt_model(pn_feats, pillar_pixels.int())


if __name__ == "__main__":
    main()

Expected behavior

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0): 2.0.0dev and 2.2.0
  • PyTorch Version (e.g. 1.0): 2.2
  • CPU Architecture: i7-12800H
  • OS (e.g., Linux): Linux
  • How you installed PyTorch (conda, pip, libtorch, source): nvcr.io/nvidia/pytorch:24.01-py3
  • Build command you used (if compiling from source):
  • Are you using local sources or building from archives:
  • Python version: 3.10.12
  • CUDA version: 12.2
  • GPU models and configuration: GeForce RTX 3080 Ti
  • Any other relevant information:

Additional context

airalcorn2 avatar Apr 11 '24 13:04 airalcorn2

Hi - thanks for the report - I am able to reproduce the issue. For a quick workaround, try one of the following replacements:

pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(pn_feats)
batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels)


##### Replace the above with:


pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).cuda()
batch_idxs = torch.arange(N).repeat_interleave(P).cuda()

##### or

pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C, device=pn_feats.device)
batch_idxs = torch.arange(N, device=pillar_pixels.device).repeat_interleave(P)

##### or 

pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(pn_feats.device)
batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels.device)

It seems that the .to with the tensor itself as input is not being interpreted correctly here.

With respect to the Dynamo path, I have added #2747 to add support for aten.masked_select.default. Does the model successfully compile with fallback (running aten.masked_select.default in Torch) when using ir="dynamo"?

gs-olive avatar Apr 12 '24 02:04 gs-olive

Thanks for the workarounds, @gs-olive! I went with:

pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(pn_feats.device)
batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels.device)

for consistency's sake and that worked for me.

For the Dynamo path, I tried enabling fallback with:

trt_model = torch_tensorrt.compile(
    model,
    inputs=inputs,
    enabled_precisions=enabled_precisions,
    truncate_long_and_double=True,
    torch_executed_ops=["aten::masked_select"],
)

but was getting the same error. However, I just noticed there's actually an earlier error raised before the second error:

DynamicOutputShapeException: aten.masked_select.default

The above exception was the direct cause of the following exception:

airalcorn2 avatar Apr 12 '24 13:04 airalcorn2

That newly discovered error in the Dynamo path led me to this issue and this issue, which was fixed here according to the comments.

airalcorn2 avatar Apr 12 '24 13:04 airalcorn2

Interestingly, this code, which just uses normal boolean indexing, seems to work with the Dynamo path:

import torch
import torch_tensorrt

from torch import nn

DEVICE = "cuda:0"


class Indexer(nn.Module):
    def __init__(self, side_cells):
        super().__init__()
        self.side_cells = side_cells

    def forward(self, pn_feats, pillar_pixels):
        (N, P, C) = pn_feats.shape
        pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(
            pn_feats.device
        )
        batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels.device)
        rows = pillar_pixels[..., 0].flatten()
        cols = pillar_pixels[..., 1].flatten()
        mask = rows != -1
        batch_idxs = batch_idxs[mask]
        rows = rows[mask]
        cols = cols[mask]
        pn_feats = pn_feats.reshape(-1, C)[mask]
        pseudo_images[batch_idxs, rows, cols] = pn_feats
        return pseudo_images


def main():
    side_cells = 200
    pn_feats = torch.rand((1, 12000, 64)).to(DEVICE)
    pillar_pixels = torch.randint(0, side_cells, (1, 12000, 2)).to(DEVICE)
    pillar_pixels[0, 800:] = -1

    model = Indexer(side_cells).to(DEVICE)
    model.eval()
    with torch.no_grad():
        pt_preds = model(pn_feats, pillar_pixels)

    inputs = [
        torch_tensorrt.Input(pn_feats.shape),
        torch_tensorrt.Input(pillar_pixels.shape, dtype=torch.int32),
    ]
    enabled_precisions = {torch.half, torch.float32}
    trt_model = torch_tensorrt.compile(
        model,
        inputs=inputs,
        enabled_precisions=enabled_precisions,
        truncate_long_and_double=True,
        min_block_size=1,
    )
    trt_preds = trt_model(pn_feats, pillar_pixels.int())
    print((pt_preds == trt_preds[0]).sum())


if __name__ == "__main__":
    main()

airalcorn2 avatar Apr 12 '24 13:04 airalcorn2

When using PyTorch 2.3.1 and Torch-TensorRT 2.3.0, I still get the same error for the original code (i.e., with ir="torchscript"), but when using the Dynamo backend or output_format="torchscript", I don't get an error and trt_preds == pt_preds.

airalcorn2 avatar Jun 11 '24 19:06 airalcorn2