TensorRT SiLU(Swish) Quantization with QDQ

Description

I am trying to quantize swish(sigmoid + mul) operator into int8 using trtexec tool, but the result has not been satisfactory.

# trtexec command line
trtexec --verbose --nvtxMode=verbose --buildOnly --workspace=8192 --onnx=model.onnx --saveEngine=model.onnx.engine --timingCacheFile=./timing.cache --fp16 --int8

The original onnx model structure(just remove .zip) NetQuantizeSwish.onnx_simp.onnx.zip:

The original onnx model structure with QDQ NetQuantizeSwish_QDQ.onnx_simp.onnx.zip:

If I use trtexec to transform the onnx model without QDQ, and the result is very good. swish is quantized into a PWN operator.

If I use trtexec to transform the onnx model with QDQ, the result is bad.

I tried to insert QDQ in different positions(position 1,2,3,4), but I couldn't convert swish into a separate PWN operator. ef35d99388a99618e5714f8db0fd77d

e.g. insert QDQ in positon 1 and 4, the result is:

I want to know how to insert the QDQ operator correctly in order to convert swish into a single PWN operator and why?

Environment

TensorRT Version: 8.2.4 NVIDIA GPU: 2080Ti NVIDIA Driver Version: 535.54.03 CUDA Version: 11.6 Onnx Version: 1.13.1 Onnx Opset Version: 13

Operating System: ubuntu20.04 Python Version (if applicable): 3.8

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

Jan 24 '24 11:01 Garfield2005

@ttyio ^ ^

Jan 27 '24 07:01 zerollzeng

Hi, I'm new to TensorRT and I can't answer your question. Would you mind telling me how you get the tensorrt engine visualization image? It seems very useful.

Mar 06 '24 08:03 zhexinli

@Garfield2005 sorry for the delay response, could you upgrade your TRT version? @zhexinli we have a visualization tool in https://github.com/NVIDIA/TensorRT/tree/main/tools/experimental/trt-engine-explorer

Mar 09 '24 02:03 ttyio

Closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks!

Apr 16 '24 16:04 ttyio

TensorRT TensorRT copied to clipboard

SiLU(Swish) Quantization with QDQ

Description

Environment

Relevant Files

Steps To Reproduce

TensorRT
TensorRT copied to clipboard