TensorRT
TensorRT copied to clipboard
🐛 [Bug] RuntimeError: [Error thrown at core/runtime/execute_engine.cpp:132] Expected inputs[i].is_cuda() to be true but got false Expected input tensors to have device cuda, found device cpu
Bug Description
The code below produces the following error:
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: [Error thrown at core/runtime/execute_engine.cpp:132] Expected inputs[i].is_cuda() to be true but got false
Expected input tensors to have device cuda, found device cpu
This same code works fine with Torch-TensorRT 1.4.0. When using the Dynamo backend, I get the following error:
Unsupported: dynamic shape operator: aten.masked_select.default
To Reproduce
import torch
import torch_tensorrt
from torch import nn
DEVICE = "cuda:0"
class Indexer(nn.Module):
def __init__(self, side_cells):
super().__init__()
self.side_cells = side_cells
def forward(self, pn_feats, pillar_pixels):
(N, P, C) = pn_feats.shape
pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(pn_feats)
batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels)
rows = pillar_pixels[..., 0].flatten()
cols = pillar_pixels[..., 1].flatten()
mask = rows != -1
batch_idxs = torch.masked_select(batch_idxs, mask)
rows = torch.masked_select(rows, mask)
cols = torch.masked_select(cols, mask)
pn_feats = torch.masked_select(pn_feats.reshape(-1, C), mask[:, None])
pn_feats = pn_feats.reshape(len(rows), C)
pseudo_images[batch_idxs, rows, cols] = pn_feats
return pseudo_images
def main():
side_cells = 200
pn_feats = torch.rand((1, 12000, 64)).to(DEVICE)
pillar_pixels = torch.randint(0, side_cells, (1, 12000, 2)).to(DEVICE)
pillar_pixels[0, 800:] = -1
model = Indexer(side_cells).to(DEVICE)
model.eval()
with torch.no_grad():
pt_preds = model(pn_feats, pillar_pixels)
inputs = [
torch_tensorrt.Input(pn_feats.shape),
torch_tensorrt.Input(pillar_pixels.shape, dtype=torch.int32),
]
enabled_precisions = {torch.half, torch.float32}
trt_model = torch_tensorrt.compile(
model,
inputs=inputs,
enabled_precisions=enabled_precisions,
truncate_long_and_double=True,
ir="torchscript",
)
trt_preds = trt_model(pn_feats, pillar_pixels.int())
if __name__ == "__main__":
main()
Expected behavior
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
- Torch-TensorRT Version (e.g. 1.0.0): 2.0.0dev and 2.2.0
- PyTorch Version (e.g. 1.0): 2.2
- CPU Architecture: i7-12800H
- OS (e.g., Linux): Linux
- How you installed PyTorch (
conda,pip,libtorch, source):nvcr.io/nvidia/pytorch:24.01-py3 - Build command you used (if compiling from source):
- Are you using local sources or building from archives:
- Python version: 3.10.12
- CUDA version: 12.2
- GPU models and configuration: GeForce RTX 3080 Ti
- Any other relevant information:
Additional context
Hi - thanks for the report - I am able to reproduce the issue. For a quick workaround, try one of the following replacements:
pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(pn_feats)
batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels)
##### Replace the above with:
pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).cuda()
batch_idxs = torch.arange(N).repeat_interleave(P).cuda()
##### or
pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C, device=pn_feats.device)
batch_idxs = torch.arange(N, device=pillar_pixels.device).repeat_interleave(P)
##### or
pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(pn_feats.device)
batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels.device)
It seems that the .to with the tensor itself as input is not being interpreted correctly here.
With respect to the Dynamo path, I have added #2747 to add support for aten.masked_select.default. Does the model successfully compile with fallback (running aten.masked_select.default in Torch) when using ir="dynamo"?
Thanks for the workarounds, @gs-olive! I went with:
pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(pn_feats.device)
batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels.device)
for consistency's sake and that worked for me.
For the Dynamo path, I tried enabling fallback with:
trt_model = torch_tensorrt.compile(
model,
inputs=inputs,
enabled_precisions=enabled_precisions,
truncate_long_and_double=True,
torch_executed_ops=["aten::masked_select"],
)
but was getting the same error. However, I just noticed there's actually an earlier error raised before the second error:
DynamicOutputShapeException: aten.masked_select.default
The above exception was the direct cause of the following exception:
That newly discovered error in the Dynamo path led me to this issue and this issue, which was fixed here according to the comments.
Interestingly, this code, which just uses normal boolean indexing, seems to work with the Dynamo path:
import torch
import torch_tensorrt
from torch import nn
DEVICE = "cuda:0"
class Indexer(nn.Module):
def __init__(self, side_cells):
super().__init__()
self.side_cells = side_cells
def forward(self, pn_feats, pillar_pixels):
(N, P, C) = pn_feats.shape
pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(
pn_feats.device
)
batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels.device)
rows = pillar_pixels[..., 0].flatten()
cols = pillar_pixels[..., 1].flatten()
mask = rows != -1
batch_idxs = batch_idxs[mask]
rows = rows[mask]
cols = cols[mask]
pn_feats = pn_feats.reshape(-1, C)[mask]
pseudo_images[batch_idxs, rows, cols] = pn_feats
return pseudo_images
def main():
side_cells = 200
pn_feats = torch.rand((1, 12000, 64)).to(DEVICE)
pillar_pixels = torch.randint(0, side_cells, (1, 12000, 2)).to(DEVICE)
pillar_pixels[0, 800:] = -1
model = Indexer(side_cells).to(DEVICE)
model.eval()
with torch.no_grad():
pt_preds = model(pn_feats, pillar_pixels)
inputs = [
torch_tensorrt.Input(pn_feats.shape),
torch_tensorrt.Input(pillar_pixels.shape, dtype=torch.int32),
]
enabled_precisions = {torch.half, torch.float32}
trt_model = torch_tensorrt.compile(
model,
inputs=inputs,
enabled_precisions=enabled_precisions,
truncate_long_and_double=True,
min_block_size=1,
)
trt_preds = trt_model(pn_feats, pillar_pixels.int())
print((pt_preds == trt_preds[0]).sum())
if __name__ == "__main__":
main()
When using PyTorch 2.3.1 and Torch-TensorRT 2.3.0, I still get the same error for the original code (i.e., with ir="torchscript"), but when using the Dynamo backend or output_format="torchscript", I don't get an error and trt_preds == pt_preds.