TensorRT
TensorRT copied to clipboard
Are torch.nn.functional methods automatically quantized by pytorch-quantization?
Description
I am trying to quantize a Pytorch model to INT8 to run with tensrorrt. I have read these docs, and am still unclear on whether I have to make custom quantization implementations for torch.nn.functional methods, namely F.Conv2d(), F.ReLU(), F.max_pool2d(), F.interpolate(). I use both of these during inference and am concerned they are computed in FP32.
The forward-pass looks like this:
def forward(self, x)
conv_output = self.conv_features(x) # nn.Conv2d, n..ReLU, nn.BN, nn.MaxPool
distances = self._l2_convolution(conv_output) # F.Conv2d, F.ReLU
similarities = self.distance_2_similarity(distances) # torch.log()
min_similarities = -F.max_pool2d(-similarities,
kernel_size=(similarities.size()[2],
similarities.size()[3]))
sim_score = min_similarities.view(-1, self.num_prototypes)
upsampled_activation_pattern = F.interpolate(similarities, size=self.img_size, mode='bicubic')
logits = self.last_layer(sim_score) # Linear
return logits, sim_score, upsampled_activation_pattern
Following this example I perform quantization:
from pytorch_quantization import nn as quant_nn
from pytorch_quantization import calib
from pytorch_quantization.tensor_quant import QuantDescriptor
from pytorch_quantization import quant_modules
quant_desc_input = QuantDescriptor(calib_method='histogram')
quant_nn.QuantConv2d.set_default_quant_desc_input(quant_desc_input)
quant_nn.QuantLinear.set_default_quant_desc_input(quant_desc_input)
quant_nn.QuantMaxPool2d.set_default_quant_desc_input(quant_desc_input)
This is the command I use to convert to ONNX:
torch.onnx.export(model, dummy_input, onnx_filename, verbose=False, opset_version=13, do_constant_folding=True)
This is the command I use to convert to a TensortRT engine:
trtexec --onnx=<file_name> --saveEngine=<file_name> --explicitBatch --int8
Thank you.
add quant_modules.initialize()
I have that in my code below -does adding that quantize the functional methods? It definitely is quantizing nn.modules correctly.
Please check our sample(https://github.com/NVIDIA/TensorRT/tree/release/8.6/tools/pytorch-quantization/examples) and documentation.
Like https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/index.html#document-tutorials/creating_custom_quantized_modules
closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks!