cuDLA-samples why we should use apply_custom_rules_to

why we should use apply_custom_rules_to_quantizer?

Open dahaiyidi opened this issue 1 year ago • 1 comments

In quantize.py I find the following function. And it is used in qat.py. why should we find the quantizer_pairs? why should we set: major = bottleneck.cv1.conv._input_quantizer bottleneck.addop._input0_quantizer = major bottleneck.addop._input1_quantizer = major

def apply_custom_rules_to_quantizer(model : torch.nn.Module, export_onnx : Callable):

# apply rules to graph
export_onnx(model, "quantization-custom-rules-temp.onnx")
pairs = find_quantizer_pairs("quantization-custom-rules-temp.onnx")
print(pairs)
for major, sub in pairs:
    print(f"Rules: {sub} match to {major}")
    get_attr_with_path(model, sub)._input_quantizer = get_attr_with_path(model, major)._input_quantizer  # why use the same input_quantizer??
os.remove("quantization-custom-rules-temp.onnx")

for name, bottleneck in model.named_modules():
    if bottleneck.__class__.__name__ == "Bottleneck":
        if bottleneck.add:
            print(f"Rules: {name}.add match to {name}.cv1")
            major = bottleneck.cv1.conv._input_quantizer
            bottleneck.addop._input0_quantizer = major
            bottleneck.addop._input1_quantizer = major

Thanks.

Nov 01 '23 12:11 dahaiyidi

If we use https://github.com/NVIDIA-AI-IOT/cuDLA-samples/tree/main/export#option1, the generated model can also run on the GPU. However, If the Q&DQ nodes of these tensors are inconsistent, there are a lot of useless int8->fp16 and fp16->int8 data convert in our QAT model. This will slow down the model inference speed.

Nov 16 '23 07:11 liuanqi-libra7

cuDLA-samples cuDLA-samples copied to clipboard

why we should use apply_custom_rules_to_quantizer?

cuDLA-samples
cuDLA-samples copied to clipboard