TensorRT ❓ [Question] Why TensorRT model is slower?

❓ Question

Why TensorRT model is slower? I have tried TensorRT in a MHA (multihead attention) model, but found it is even slower than the jit scripted model.

What you have already tried

I tested the original model, the jit scripted model, the jit model after optimization, and the TensorRT model. Then, I found the tensorrt model is not as fast as I expected. The model here is a simple MHA module modified from fairseq so it could pass the compilation.

import time
import tmp_attn
import torch
import tensorrt
import torch_tensorrt as torch_trt


def timer(m, i):
    st = time.time()
    for _ in range(10000):
        m(i, i, i)
    ed = time.time()
    return ed - st


t1 = torch.randn(64, 1, 1280, device="cuda:0")
model = tmp_attn.MultiheadAttention(1280, 8).to("cuda:0")
model2 = torch.jit.script(model)
model3 = torch.jit.optimize_for_inference(model2)
model4 = torch_trt.compile(model, inputs=[t1, t1, t1]).to("cuda:0")

print("Original Model", timer(model, t1))
print("Jit Script Model", timer(model2, t1))
print("Jit Script Model after optimization", timer(model3, t1))
print("TensorRT Model", timer(model4, t1))

I ran these models 10000 times and record the spent time. The output is: Original Model 5.6981117725372314 Jit Script Model 4.5694739818573 Jit Script Model after optimization 3.3332810401916504 TensorRT Model 4.772718667984009

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

PyTorch Version (e.g., 1.0): 1.11.0
CPU Architecture: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
OS (e.g., Linux): Linux, CentOS7
How you installed PyTorch (conda, pip, libtorch, source): conda
Build command you used (if compiling from source): /
Are you using local sources or building from archives: No
Python version: 3.7
CUDA version: 11.7
GPU models and configuration:
TensorRT version: 8.2.5.1
Torch_tensorrt version: 1.1.0

Additional context

The code of MHA is here. tmp_attn.py

tmp_attn.py.zip

Jun 20 '22 06:06 geekinglcq

More information: I test TensorRT in an Encoder Layer module, basically, it is an Attention module above with some fc (full-connected) layers, layer norm and dropout layers. The results show that TensorRT achieves about 2x speedup.

	MHA		EncoderLayer
	time	%	time	%
Original Module	4.46	1	10.77	1
Jit Scripted Module	4.42	0.99	9.612	0.89
Jit Module with Optimization	2.9	0.65	5.775	0.53
TensorRT	4.34	0.97	4.875	0.45

Jun 22 '22 08:06 geekinglcq

I'm getting this output performance using your script:

Original Model 3.2848336696624756
Jit Script Model 2.7592527866363525
Jit Script Model after optimization 2.0758402347564697
TensorRT Model 1.4786508083343506

Jul 01 '22 19:07 bowang007

I'm getting this output performance using your script:

Original Model 3.2848336696624756
Jit Script Model 2.7592527866363525
Jit Script Model after optimization 2.0758402347564697
TensorRT Model 1.4786508083343506

Is there any significant difference between your environment and mine?

Jul 02 '22 03:07 geekinglcq

Hi @geekinglcq here is my env details:

PyTorch Version (e.g., 1.0): 1.11.0
CPU Architecture: Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz
OS (e.g., Linux): Linux, Ubuntu 20.04
How you installed PyTorch (conda, pip, libtorch, source): pip
Build command you used (if compiling from source): /
Are you using local sources or building from archives: yes
Python version: 3.8
CUDA version: 11.3
GPU models and configuration:
TensorRT version: 8.2.4.2
Torch_tensorrt version: 1.1.0

Jul 19 '22 22:07 bowang007

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Oct 18 '22 00:10 github-actions[bot]

@geekinglcq have you find anythind？ I meet the same problem though different model with different tensorrt version

Nov 09 '23 09:11 Liujingxiu23

TensorRT TensorRT copied to clipboard

❓ [Question] Why TensorRT model is slower?

❓ Question

What you have already tried

Environment

Additional context

TensorRT
TensorRT copied to clipboard