TensorRT
TensorRT copied to clipboard
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
## Bug Description `torch.ops.aten.remainder.Scalar` seems to return fmod result when input number is big ## To Reproduce save it and run the script below ``` import torch import torch.nn as...
## To Reproduce ```py import torch import torch.nn as nn import torch_tensorrt class MyModule(nn.Module): def __init__(self): super().__init__() self.conv = nn.Conv2d(3, 3, 2, 2) self.beta = nn.Parameter(torch.ones((1, 3, 1, 1), dtype=torch.float))...
## Bug Description cannot load quantize_fp8 even though the modelopt[all] installed ``` WARNING:torch_tensorrt.dynamo.conversion.aten_ops_converters:Unable to import quantization op. Please install modelopt library (https://github.com/NVIDIA/TensorRT-Model-Optimizer?tab=readme-ov-file#installation) to add support for compiling quantized models WARNING:py.warnings:/usr/lib64/python3.11/tempfile.py:904:...
## Bug Description When replacing the view nodes with reshape nodes, the metadata of the original view nodes, are assigned to the reshape nodes in the wrong order. For example,...
🐛 [Bug] Importing `torchao` first breaks `torch_tensorrt.dynamo.compile` during `run_decompositions`
## Bug Description Importing `torchao` before importing `torch_tensorrt` causes `F.interpolate` to fail during `run_decompositions` with: `AssertionError: Expected aten.upsample_nearest2d.default to have CompositeImplicitAutograd kernel` ## To Reproduce ```py import torchao import torch_tensorrt...
## Bug Description Remove the TRTElementWiseOp and TRTTensor ```py from torch_tensorrt.dynamo.types import TRTElementWiseOp, TRTTensor ``` ## To Reproduce Steps to reproduce the behavior: 1. 2. 3. ## Expected behavior ##...
# Description Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change....
## Bug Description In [the CI tests](https://github.com/pytorch/TensorRT/actions/runs/11191996718/job/31115878141?pr=3167), all tests of cumsum was failed on CUDA 11.8 and 12.1, but works on 12.4. The error is like: ``` FAILED conversion/test_cumsum_aten.py::TestCumsumConverter::test_cumsum_1D_0 -...
## Bug Description When `debug=True` and `rich` module is available, `_RichMonitor` causes the engine building time significantly longer, and the performance of the engine is also decreased. Using `_ASCIIMonitor` (i.e....
# Description Improves the logging in the converter registry Fixes # (issue) ## Type of change Please delete options that are not relevant and/or add your own. - New feature...