🐛 [Bug] Encountered bug when using Torch-TensorRT (We don't have an op for aten::floor_divide but it isn't a special case)
Bug Description
Hi, I'm trying to compile a model with torch_tensorrt. I was able to successfully create the scripted model but when compiling it I'm getting the following error:
INFO: [Torch-TensorRT] - ir was set to default, using TorchScript as ir
DEBUG: [Torch-TensorRT] - Settings requested for Lowering:
torch_executed_modules: [
]
Traceback (most recent call last):
File "test.py", line 103, in <module>
trt_model = torch_tensorrt.compile(scripted_model,
File "/media/andrea/Disk_21/Desktop/ARES/leav-action-recognition-pipeline/test_env/lib/python3.8/site-packages/torch_tensorrt/_compile.py", line 115, in compile
return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)
File "/media/andrea/Disk_21/Desktop/ARES/leav-action-recognition-pipeline/test_env/lib/python3.8/site-packages/torch_tensorrt/ts/_compiler.py", line 113, in compile
compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
RuntimeError: 0INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/alias_analysis.cpp":607, please report a bug to PyTorch. We don't have an op for aten::floor_divide but it isn't a special case. Argument types: int, int,
Candidates:
aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::floor_divide.out(Tensor self, Tensor other, *, Tensor(a!) out) -> (Tensor(a!))
I don't know which part of the model is exactly causing this error yet, I'll post a simple version once I figure it out but I think torch_tensortrt 1.1.0 release is supposed to support floor_divide.
This is what I'm doing to compile the model:
model.eval().cuda()
scripted_model = torch.jit.script(model)
with torch_tensorrt.logging.debug():
trt_model = torch_tensorrt.compile(scripted_model,
inputs = [torch_tensorrt.Input((1, 3, 16, 344, 344))],
enabled_precisions= {torch.half},
workspace_size= 1 << 20,
truncate_long_and_double=True,
require_full_compilation=False, #True
)
Expected behavior
I was expecting floor_divide to be supported in the 1.1.0 release based on the information given here: https://github.com/pytorch/TensorRT/releases
Environment
- Torch-TensorRT Version: 1.1.0
- PyTorch Version: 1.11.0+cu113
- CPU Architecture: x86_64
- OS: Ubuntu 20.04
- How you installed PyTorch: pip
- Python version: 3.8
- CUDA version: 11.3
- GPU models and configuration: NVIDIA GeForce RTX 3070
From what I can tell the issue is that there is some operation in your model of the form:
aten::floor_divide(int self, int other) -> ...
Which does not seem to be a valid TorchScript operator which why PyTorch not necessarily us is reporting this issue.
I took a look at PyTorch and saw there is a aten::floordiv.int operator which would make sense for the input types you have. The question is why this model has floor_divide and not floordiv if that is indeed what the operation is supposed to be? Perhaps we are inserting this erroneously in some lowering pass (This is just a theory based on limited info).
Some quick debugging steps you can take would be to grep for instances of aten::floor_divide in the debug logs, specifically the lowered graph. TorchScript includes source locations so that may help you narrow down where in your code is emitting aten::floor_divide
This is the only lowering pass which uses floor_divide but seems to use the tensor variant so that probably is not the root cause: https://cs.github.com/pytorch/TensorRT/blob/679ea2179aaaf28fd16203d610315ddf9ea8dfe8/core/lowering/passes/reduce_remainder.cpp?q=repo%3Apytorch%2Ftensorrt+aten%3A%3Afloor_divide+language%3AC%2B%2B
@narendasan Thank you for your response. So far it seems like what causing the issue is the Modulus operator (%).
The following network gives the same error:
class SomeNet(nn.Module):
def __init__(self) -> None:
super().__init__()
def forward(self, x: Tensor) -> Tensor:
input_shape = x.shape
if input_shape[2] % 2 == 0:
return x
else:
return torch.tensor(0)
This is the torchscript graph I am getting back (PyTorch 1.12.1)
torchtrt39 ❯ python /Users/naren/Developer/py/pytorch_org/tensorrt/experiments/1305.py
graph(%self : __torch__.SomeNet,
%x.1 : Tensor):
%16 : bool = prim::Constant[value=0]()
%14 : NoneType = prim::Constant()
%5 : int = prim::Constant[value=2]() # /Users/naren/Developer/py/pytorch_org/tensorrt/experiments/1305.py:12:23
%10 : int = prim::Constant[value=0]() # /Users/naren/Developer/py/pytorch_org/tensorrt/experiments/1305.py:12:33
%input_shape.1 : int[] = aten::size(%x.1) # <string>:13:9
%8 : int = aten::__getitem__(%input_shape.1, %5) # /Users/naren/Developer/py/pytorch_org/tensorrt/experiments/1305.py:12:11
%9 : int = aten::remainder(%8, %5) # /Users/naren/Developer/py/pytorch_org/tensorrt/experiments/1305.py:12:11
%11 : bool = aten::eq(%9, %10) # /Users/naren/Developer/py/pytorch_org/tensorrt/experiments/1305.py:12:11
%26 : Tensor = prim::If(%11) # /Users/naren/Developer/py/pytorch_org/tensorrt/experiments/1305.py:12:8
block0():
-> (%x.1)
block1():
%17 : Tensor = aten::tensor(%10, %14, %14, %16) # /Users/naren/Developer/py/pytorch_org/tensorrt/experiments/1305.py:15:18
-> (%17)
return (%26)
Don't seem to see the floor_divide. Maybe this changed in 1.12?
Seems like the same graph from PyTorch 1.11
@narendasan Sorry for the late reply!
Yes, I'm getting the same graph but tensorrt still gives the same error.
graph(%self : __torch__.MoViNet_pytorch.movinets.models.SomeNet,
%x.1 : Tensor):
%16 : bool = prim::Constant[value=0]()
%14 : NoneType = prim::Constant()
%5 : int = prim::Constant[value=2]() # /media/andrea/Disk_21/Desktop/ARES/leav-action-recognition-pipeline/MoViNet_pytorch/movinets/models.py:11:23
%10 : int = prim::Constant[value=0]() # /media/andrea/Disk_21/Desktop/ARES/leav-action-recognition-pipeline/MoViNet_pytorch/movinets/models.py:11:33
%input_shape.1 : int[] = aten::size(%x.1) # <string>:13:9
%8 : int = aten::__getitem__(%input_shape.1, %5) # /media/andrea/Disk_21/Desktop/ARES/leav-action-recognition-pipeline/MoViNet_pytorch/movinets/models.py:11:11
%9 : int = aten::remainder(%8, %5) # /media/andrea/Disk_21/Desktop/ARES/leav-action-recognition-pipeline/MoViNet_pytorch/movinets/models.py:11:11
%11 : bool = aten::eq(%9, %10) # /media/andrea/Disk_21/Desktop/ARES/leav-action-recognition-pipeline/MoViNet_pytorch/movinets/models.py:11:11
%26 : Tensor = prim::If(%11) # /media/andrea/Disk_21/Desktop/ARES/leav-action-recognition-pipeline/MoViNet_pytorch/movinets/models.py:11:8
block0():
-> (%x.1)
block1():
%17 : Tensor = aten::tensor(%10, %14, %14, %16) # /media/andrea/Disk_21/Desktop/ARES/leav-action-recognition-pipeline/MoViNet_pytorch/movinets/models.py:14:18
-> (%17)
return (%26)
Traceback (most recent call last):
File "test.py", line 112, in <module>
trt_model = torch_tensorrt.compile(model,
File "/media/andrea/Disk_21/Desktop/ARES/leav-action-recognition-pipeline/test_env/lib/python3.8/site-packages/torch_tensorrt/_compile.py", line 115, in compile
return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)
File "/media/andrea/Disk_21/Desktop/ARES/leav-action-recognition-pipeline/test_env/lib/python3.8/site-packages/torch_tensorrt/ts/_compiler.py", line 113, in compile
compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
RuntimeError: 0INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/alias_analysis.cpp":607, please report a bug to PyTorch. We don't have an op for aten::floor_divide but it isn't a special case. Argument types: int, int,
Candidates:
aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::floor_divide.out(Tensor self, Tensor other, *, Tensor(a!) out) -> (Tensor(a!))
I replaced % with the following function and I can run the code without error now!
def modulo(a: int, b: int) -> int:
return int(a - b * torch.floor(torch.div(a, b)))
@narendasan seems a lowering pass could be a good WAR here.
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days