TensorRT
                                
                                 TensorRT copied to clipboard
                                
                                    TensorRT copied to clipboard
                            
                            
                            
                        GELU Plugin increase the inference time!
Description
When I use the GELU Plugin in my project, it increases the inference time. Before GELU Plugin, inference time is 44ms (fp32). After GELU Plugin, inference time is 102ms (fp32). I'm very confused about this phenomenon.
Environment
TensorRT Version: 8.4.0.6 NVIDIA GPU: Tesla T4 NVIDIA Driver Version: 10.2 CUDA Version: 10.2 CUDNN Version: 8.3.2 Operating System: CentOS Baremetal or Container (if so, version): No
Steps To Reproduce
(1) use onnx-graphsurgeon the merge ops of LayerNorm and GELU.
(2) use trtexec to generate trt engine:
trtexec --onnx=./myonnx.onnx --saveEngine=./myengine.trt --plugins=./libgelu.so --plugins=./liblaynorm.so --verbose
(3) use cpp to infer model.
Results:
(1) only layernorm: 44ms (fp32) / 20ms (fp16) (2) layernorm + gelu: 102ms (fp32) / 86ms (fp16)
@nvpohanh @kevinch-nv Is this the best practice?
It is not recommended to use GeLU plugin.
Closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!