TensorRT
TensorRT copied to clipboard
❓ [Question] compiled ExportedProgram is slower than uncompiled model
❓ Question
I tried compiling a few models with torch_tensorrt.compile(model, inputs, ir='dynamo', ...) and each one of them was slower than the respective uncompiled model. I was wondering if I was using torch_tensorrt incorrectly.
What you have already tried
A minimum example:
import torch
import torch_tensorrt
import time
model = torch.hub.load('pytorch/vision:v0.10.0', 'mobilenet_v2', pretrained=True)
model.eval().cuda()
inputs = [
torch_tensorrt.Input(
shape=torch.Size((1, 3, 480, 640)),
dtype=torch.float,
)
]
trt_model = torch_tensorrt.compile(model, inputs=inputs, ir='dynamo', truncate_long_and_double=True, enabled_precisions={torch.half}, opt_level='max')
The inference time was measured as below:
x = torch.rand((1, 3, 480, 640)).cuda() - 0.5
# warm up
for _ in range(10):
trt_model(x)
total_time = 0
for _ in range(20):
start = time.time()
out = trt_model(x)
total_time += time.time() - start
print(total_time / 20)
On average the uncompiled model inference time is 4ms and compiled model 9ms.
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
- PyTorch Version (e.g., 1.0): 2.2.1
- CPU Architecture: x86_64
- OS (e.g., Linux): Linux
- How you installed PyTorch (
conda,pip,libtorch, source): pip intall torch torch_tensorrt - Build command you used (if compiling from source):
- Are you using local sources or building from archives:
- Python version: 3.11
- CUDA version: 12.3
- GPU models and configuration: NVIDIA GeForce RTX 4050
- Any other relevant information:
Additional context
Update:
In comparison, trt_model = torch.compile(model, backend='tensorrt', options={'truncate_long_and_double': True, 'enabled_precisions': {torch.half}}) reduces the inference time to 0.3ms. But JIT compilation would be infeasible for my use case since the time it takes to recompile every time would be too long.
Update 2:
Using dynamo with output_format='fx' does accelerate the model. Was I using the ExportedProgram wrong?