TensorRT
                                
                                 TensorRT copied to clipboard
                                
                                    TensorRT copied to clipboard
                            
                            
                            
                        Should I use pytorch-quantization or not?
Description
I have trained a model by using pytorch and export an onnx model. Now, I want to run it on TensorRT with fp16. Should I use the pytorch-quantization before I use the TensorRT, or TensorRT will automatically quantized the model when I using fp16? If TensorRT will automatically quantized the model what is pytorch-quantization this tool's uses?
Environment
TensorRT Version: NVIDIA GPU: NVIDIA Driver Version: CUDA Version: CUDNN Version: Operating System: Python Version (if applicable): Tensorflow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if so, version):
Relevant Files
Steps To Reproduce
pytorch-quantization tool is used for INT8 QAT, you don't need this tool if you just want to use FP16, TRT will handle the FP32->FP16 conversion automatically.
pytorch-quantization tool is used for INT8 QAT, you don't need this tool if you just want to use FP16, TRT will handle the FP32->FP16 conversion automatically.
If I use INT8 and using the calibration on TensorRT, I still need to use this tool?
Please refer to https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#working-with-int8
Please refer to https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#working-with-int8
Thanks a lot. I'm using PTQ now. I think PTQ and QAT are similar right?🤔 Do you have any statistics which one is better ( like which one's mAP is higher) ? If you don't have, it's all right. I'll try both of them in the future.
Try PTQ first, if PTQ doesn't satisfy the accuracy requirement. then you can try QAT them.
closing since no activity for more than 3 weeks, please reopen if you still have question, thanks!