ao Does torch.export preserve the quantize_per_tensor/dequantize_per

Does torch.export preserve the quantize_per_tensor/dequantize_per_tensor ops?

Open justinchuby opened this issue 4 months ago • 6 comments

Does torch.export preserve the quantize_per_tensor/dequantize_per_tensor ops? I was testing with

import torch
from torchao.quantization.quant_api import (
    quantize_,
    int8_dynamic_activation_int8_weight,
    int4_weight_only,
    int8_weight_only,
    unwrap_tensor_subclass,
)

# define a floating point model where some layers could be statically quantized
class M(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # QuantStub converts tensors from floating point to quantized
        # self.conv = torch.nn.Conv2d(1, 1, 1)
        self.linear = torch.nn.Linear(4, 8)
        self.relu = torch.nn.ReLU()
        # DeQuantStub converts tensors from quantized to floating point

    def forward(self, x):
        # manually specify where tensors will be converted from floating
        # point to quantized in the quantized model
        x = self.linear(x)
        x = self.relu(x)
        return x

# create a model instance
model = M()
model.eval()

quantize_(model, int8_weight_only())
model = unwrap_tensor_subclass(model)

input_fp32 = torch.randn(1, 1, 4)

# dynamo export
program = torch.onnx.export(
    model,
    (input_fp32,),
    dynamo=True,
    report=True
)

print(program)

There I don't seem to see the quant/dequant ops. I was hoping that they are preserved so that converting to onnx is easier. Or is there a different convention for representing the quantized operations?

Oct 01 '24 20:10 justinchuby

ao ao copied to clipboard

Does torch.export preserve the quantize_per_tensor/dequantize_per_tensor ops?

ao
ao copied to clipboard