TensorRT
TensorRT copied to clipboard
🐛 [Bug] Advanced Indexing/GatherND compilation causes error: `isObject() INTERNAL ASSERT FAILED at "libtorch/_virtual_includes/ATen/ATen/core/ivalue_inl.h":123, please report a bug to PyTorch. Expected Object but got None`
Bug Description
PyTorch uses advanced indexing to implement GatherND. For example:
torch[[0,1], :, None, torch.tensor((0,1))]
is considered a valid indexing operation. However, when applying TensorRT compilation, we see the following error:
isObject() INTERNAL ASSERT FAILED at "bazel-out/k8-opt/bin/external/libtorch/_virtual_includes/ATen/ATen/core/ivalue_inl.h":123, please report a bug to PyTorch. Expected Object but got None
To Reproduce
Run with nvcr.io/nvidia/pytorch:22.07-py3 container (gathernd.py):
import tensorrt
import torch
import torch_tensorrt
class GatherNDListModel(torch.nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
return x[[1,0], :, [1,0]]
class GatherNDTensorModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.index = torch.tensor((1,0))
def forward(self, x):
return x[self.index, :, self.index]
if __name__ == '__main__':
torch.manual_seed(0)
t = torch.randn(3, 4, 5).cuda()
try:
m = GatherNDListModel()
o = m(t)
print(f"PyTorch: {o.shape}")
mt = torch_tensorrt.compile(m, inputs=[t], truncate_long_and_double=True)
o = mt(t)
print(f"TRT: {o.shape}")
except Exception as err:
print(f"GatherND list indexing failed: {err}")
try:
m = GatherNDTensorModel()
o = m(t)
print(f"PyTorch: {o.shape}")
mt = torch_tensorrt.compile(m, inputs=[t], truncate_long_and_double=True)
o = mt(t)
print(f"TRT: {o.shape}")
except Exception as err:
print(f"GatherND tensor indexing failed: {err}")
Outputs the following:
PyTorch: torch.Size([2, 4])
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Int64 to Int32
GatherND list indexing failed: isObject() INTERNAL ASSERT FAILED at "bazel-out/k8-opt/bin/external/libtorch/_virtual_includes/ATen/ATen/core/ivalue_inl.h":123, please report a bug to PyTorch. Expected Object but got None
PyTorch: torch.Size([2, 4])
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Int64 to Int32
GatherND tensor indexing failed: isObject() INTERNAL ASSERT FAILED at "bazel-out/k8-opt/bin/external/libtorch/_virtual_includes/ATen/ATen/core/ivalue_inl.h":123, please report a bug to PyTorch. Expected Object but got None
Note that the pytorch version (ie. no-TRT) model outputs the expected shapes, but the TRT conversion fails.
Expected behavior
The TRT compilation should succeed and output the same output as the non-TRT version.
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
nvcr.io/nvidia/pytorch:22.07-py3
Additional context
https://partners.nvidia.com/Bug/ViewBug/3735309
I have a fix for this error as well as the general None case but we hit the current limitation where the converter does not support more than one tensor used to index. https://github.com/pytorch/TensorRT/blob/c63a5a5717e28d1a195740356fffbb9afcff36a8/core/conversion/converters/impl/select.cpp#L287
@ruoqianguo Do you have more information on this limitation? Is this something we can address but have not had time or is it a fundamental tensorrt limitation?
@narendasan The complete function looks like that https://github.com/pytorch/TensorRT/pull/921#discussion_r823279468. I think we can address this problem and i will try to support several indices input.
Just tested this on master d1768aa3d2c7d7d91d9f061e3e5dc5f976124dfe built in NGC pytorch:22.08-py3, but I'm still seeing the same errors running the above script:
root@2e1a9b9e2880:/opt/TensorRT# python /scripts/gathernd.py
PyTorch: torch.Size([2, 4])
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Int64 to Int32
GatherND list indexing failed: isObject() INTERNAL ASSERT FAILED at "bazel-out/k8-opt/bin/external/libtorch/_virtual_includes/ATen/ATen/core/ivalue_inl.h":123, please report a bug to PyTorch. Expected Obj
ect but got None
PyTorch: torch.Size([2, 4])
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Int64 to Int32
GatherND tensor indexing failed: isObject() INTERNAL ASSERT FAILED at "bazel-out/k8-opt/bin/external/libtorch/_virtual_includes/ATen/ATen/core/ivalue_inl.h":123, please report a bug to PyTorch. Expected O
bject but got None
In container pytorch/pytorch:1.12.0-cuda11.3-cudnn8-devel, I just tested this script on master d1768aa3d2c7d7d91d9f061e3e5dc5f976124dfe. And the results look like correct. Whether this results are related to container?
PyTorch: torch.Size([2, 4])
WARNING: [Torch-TensorRT] - For input x.1, found user specified input dtype as Float32. The compiler is going to use the user setting Float32
WARNING: [Torch-TensorRT] - If indices include negative values, the exported graph will produce incorrect results.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.3.2
WARNING: [Torch-TensorRT TorchScript Conversion Context] - The getMaxBatchSize() function should not be used with an engine built from a network created with
NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - The getMaxBatchSize() function should not be used with an engine built from a network created with
NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
TRT: torch.Size([2, 4])
PyTorch: torch.Size([2, 4])
WARNING: [Torch-TensorRT] - For input x.1, found user specified input dtype as Float32. The compiler is going to use the user setting Float32
WARNING: [Torch-TensorRT] - If indices include negative values, the exported graph will produce incorrect results.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.3.2
WARNING: [Torch-TensorRT TorchScript Conversion Context] - The getMaxBatchSize() function should not be used with an engine built from a network created with
NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - The getMaxBatchSize() function should not be used with an engine built from a network created with
NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.
TRT: torch.Size([2, 4])
Verified on v1.2 again. Closing. @chaoz-dev please comment if you're still facing this issue.
Keeping this open since we are seeing some intermittent issues.
Tested the latest release 1.2.0 using NGC nvcr.io/nvidia/pytorch:22.09-py3. Running the above script, I'm still seeing the same errors:
PyTorch: torch.Size([2, 4])
WARNING: [Torch-TensorRT] - For input x.1, found user specified input dtype as Float32. The compiler is going to use the user setting Float32
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Int64 to Int32
**GatherND list indexing failed: isObject() INTERNAL ASSERT FAILED at "bazel-out/k8-opt/bin/external/libtorch/_virtual_includes/ATen/ATen/core/ivalue_inl.h":123, please report a bug to PyTorch. Expected Object but got None**
PyTorch: torch.Size([2, 4])
WARNING: [Torch-TensorRT] - For input x.1, found user specified input dtype as Float32. The compiler is going to use the user setting Float32
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Int64 to Int32
**GatherND tensor indexing failed: isObject() INTERNAL ASSERT FAILED at "bazel-out/k8-opt/bin/external/libtorch/_virtual_includes/ATen/ATen/core/ivalue_inl.h":123, please report a bug to PyTorch. Expected Object but got None**
@ruoqianguo for viz.
Your output above looks correct though, so it looks like this should work... checking the 1.2 release, looks like it should contain d1768aa3d2c7d7d91d9f061e3e5dc5f976124dfe, so we should be ahead of the working commit here...
@chaoz-dev is this ok to close?
This is good to close on my side