TensorRT why the yolov8 int8 quant using pytorch_quant is slower than trt --fp16 quant

why the yolov8 int8 quant using pytorch_quant is slower than trt --fp16 quant

Open luoshiyong opened this issue 11 months ago • 6 comments

devicec : nvidia NX 1.using trt --fp16 /usr/src/tensorrt/bin/trtexec --onnx=best.onnx --workspace=4096 --saveEngine=best.engine --fp16 the result of infer speed is 36.8ms 2. using pytorch_quant int8 /usr/src/tensorrt/bin/trtexec --onnx=best.onnx --saveEngine=v8s_ptq.engine --int8 --workspace=4096 the result of infer speed is : 39.5ms

### Tasks

Apr 02 '24 01:04 luoshiyong

TensorRT TensorRT copied to clipboard

why the yolov8 int8 quant using pytorch_quant is slower than trt --fp16 quant

TensorRT
TensorRT copied to clipboard