TensorRT
TensorRT copied to clipboard
The performance comparison of the resent18 model using PTQ int8 quantization(FP16 VS INT8)
Description
After I used onnx-tensorrt to complete the int8 quantization of the resnet18 model, I found that the performance was the same as that of fp16 (batchsize=64). I would like to ask if this is reasonable? Have you compared the quantized performance of resnet18 int8 with the performance of fp16, do you have any data? Can you help to confirm that the performance of fp16 and int8 is equal and reasonable?