TensorRT I used pytorch-quantization to perform PTQ int8 quantization on ResNet50 and exported it to onnx, followed by exporting it to engine. trt. When reasoning, I found that the speed did not increase, but instead slowed down. What went wrong.

I used pytorch-quantization to perform PTQ int8 quantization on ResNet50 and exported it to onnx, followed by exporting it to engine. trt. When reasoning, I found that the speed did not increase, but instead slowed down. What went wrong.

Open jishenghuang opened this issue 10 months ago • 6 comments

TensorRT Version: 10.7

NVIDIA GPU: rtx3090

NVIDIA Driver Version:

CUDA Version: 11.7

CUDNN Version:

Operating System:

Python Version (if applicable): 3.10

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Model link:

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

Dec 30 '24 02:12 jishenghuang