TensorRT
TensorRT copied to clipboard
❓ [Question] Why TensorRT model is slower?
❓ Question
Why TensorRT model is slower? I have tried TensorRT in a MHA (multihead attention) model, but found it is even slower than the jit scripted model.
What you have already tried
I tested the original model, the jit scripted model, the jit model after optimization, and the TensorRT model. Then, I found the tensorrt model is not as fast as I expected. The model here is a simple MHA module modified from fairseq
so it could pass the compilation.
import time
import tmp_attn
import torch
import tensorrt
import torch_tensorrt as torch_trt
def timer(m, i):
st = time.time()
for _ in range(10000):
m(i, i, i)
ed = time.time()
return ed - st
t1 = torch.randn(64, 1, 1280, device="cuda:0")
model = tmp_attn.MultiheadAttention(1280, 8).to("cuda:0")
model2 = torch.jit.script(model)
model3 = torch.jit.optimize_for_inference(model2)
model4 = torch_trt.compile(model, inputs=[t1, t1, t1]).to("cuda:0")
print("Original Model", timer(model, t1))
print("Jit Script Model", timer(model2, t1))
print("Jit Script Model after optimization", timer(model3, t1))
print("TensorRT Model", timer(model4, t1))
I ran these models 10000 times and record the spent time. The output is: Original Model 5.6981117725372314 Jit Script Model 4.5694739818573 Jit Script Model after optimization 3.3332810401916504 TensorRT Model 4.772718667984009
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
- PyTorch Version (e.g., 1.0): 1.11.0
- CPU Architecture: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
- OS (e.g., Linux): Linux, CentOS7
- How you installed PyTorch (
conda
,pip
,libtorch
, source): conda - Build command you used (if compiling from source): /
- Are you using local sources or building from archives: No
- Python version: 3.7
- CUDA version: 11.7
- GPU models and configuration:
- TensorRT version: 8.2.5.1
- Torch_tensorrt version: 1.1.0
Additional context
The code of MHA is here.
tmp_attn.py
More information: I test TensorRT in an Encoder Layer module, basically, it is an Attention module above with some fc (full-connected) layers, layer norm and dropout layers. The results show that TensorRT achieves about 2x speedup.
MHA | EncoderLayer | |||
---|---|---|---|---|
time | % | time | % | |
Original Module | 4.46 | 1 | 10.77 | 1 |
Jit Scripted Module | 4.42 | 0.99 | 9.612 | 0.89 |
Jit Module with Optimization | 2.9 | 0.65 | 5.775 | 0.53 |
TensorRT | 4.34 | 0.97 | 4.875 | 0.45 |
I'm getting this output performance using your script:
Original Model 3.2848336696624756
Jit Script Model 2.7592527866363525
Jit Script Model after optimization 2.0758402347564697
TensorRT Model 1.4786508083343506
I'm getting this output performance using your script:
Original Model 3.2848336696624756 Jit Script Model 2.7592527866363525 Jit Script Model after optimization 2.0758402347564697 TensorRT Model 1.4786508083343506
Is there any significant difference between your environment and mine?
Hi @geekinglcq here is my env details:
PyTorch Version (e.g., 1.0): 1.11.0
CPU Architecture: Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz
OS (e.g., Linux): Linux, Ubuntu 20.04
How you installed PyTorch (conda, pip, libtorch, source): pip
Build command you used (if compiling from source): /
Are you using local sources or building from archives: yes
Python version: 3.8
CUDA version: 11.3
GPU models and configuration:
TensorRT version: 8.2.4.2
Torch_tensorrt version: 1.1.0
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days
@geekinglcq have you find anythind? I meet the same problem though different model with different tensorrt version