TensorRT
TensorRT copied to clipboard
SiLU(Swish) Quantization with QDQ
Description
I am trying to quantize swish(sigmoid + mul) operator into int8 using trtexec tool, but the result has not been satisfactory.
# trtexec command line
trtexec --verbose --nvtxMode=verbose --buildOnly --workspace=8192 --onnx=model.onnx --saveEngine=model.onnx.engine --timingCacheFile=./timing.cache --fp16 --int8
The original onnx model structure(just remove .zip)
NetQuantizeSwish.onnx_simp.onnx.zip:
The original onnx model structure with QDQ
NetQuantizeSwish_QDQ.onnx_simp.onnx.zip:
If I use trtexec to transform the onnx model without QDQ, and the result is very good. swish is quantized into a PWN operator.
If I use trtexec to transform the onnx model with QDQ, the result is bad.
I tried to insert QDQ in different positions(position 1,2,3,4), but I couldn't convert swish into a separate PWN operator.
e.g. insert QDQ in positon 1 and 4, the result is:
I want to know how to insert the QDQ operator correctly in order to convert swish into a single PWN operator and why?
Environment
TensorRT Version: 8.2.4 NVIDIA GPU: 2080Ti NVIDIA Driver Version: 535.54.03 CUDA Version: 11.6 Onnx Version: 1.13.1 Onnx Opset Version: 13
Operating System: ubuntu20.04 Python Version (if applicable): 3.8
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):
@ttyio ^ ^
Hi, I'm new to TensorRT and I can't answer your question. Would you mind telling me how you get the tensorrt engine visualization image? It seems very useful.
@Garfield2005 sorry for the delay response, could you upgrade your TRT version? @zhexinli we have a visualization tool in https://github.com/NVIDIA/TensorRT/tree/main/tools/experimental/trt-engine-explorer
Closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks!