LayerNorm conversion error

Open spacycoder opened this issue 8 months ago • 1 comments

Hi, In the latest version of TinyNeuralNetwork layer norm causes the conversion to fail.

error output:

Error in QNNPACK: failed to create add operator with 8.124962e-06 A-to-output scale ratio: scale ratio must be in [2**-14, 2**8) range
...
  File "../TinyNeuralNetwork/tinynn/graph/quantization/modules.py", line 136, in forward
    return self.f_add_2.add(norm_alpha, bias_fq_expand)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "../torch/ao/nn/quantized/modules/functional_modules.py", line 241, in add
    r = ops.quantized.add(x, y, scale=self.scale, zero_point=self.zero_point)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "../torch/_ops.py", line 1116, in __call__
    return self._op(*args, **(kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: createStatus == pytorch_qnnp_status_success INTERNAL ASSERT FAILED at "../aten/src/ATen/native/quantized/cpu/BinaryOps.cpp":204, please report a bug to PyTorch. failed to create QNNPACK Add operator

torch version: 2.5.1 python version: 3.12

This should reproduce it:

import torch.nn as nn
import torch
from tinynn.graph.quantization.quantizer import PostQuantizer
from tinynn.converter import TFLiteConverter
from tinynn.graph.tracer import model_tracer

class LayerNormModel(nn.Module):

    def __init__(self,):
        super().__init__()
        self.layer_norm = torch.nn.LayerNorm(256)

    def forward(self, x: torch.Tensor):
        return self.layer_norm(x)

def _main():
    dummy_input = torch.rand(1, 60, 256).float()
    model = LayerNormModel()
    qat_config = {
        "backend": "qnnpack",
        "per_tensor": True,
        "disable_requantization_for_cat": True,
    }
    with model_tracer():
        quantizer = PostQuantizer(
            model, (dummy_input), work_dir="LayerNormModel", config=qat_config
        )

        layer_norm_model = quantizer.quantize()

    layer_norm_model(dummy_input)

    with torch.no_grad():
        layer_norm_model.eval()
        layer_norm_model.cpu()

        layer_norm_model = quantizer.convert(layer_norm_model)
        torch.backends.quantized.engine = quantizer.backend
        converter = TFLiteConverter(
            layer_norm_model,
            (dummy_input),
            "layer_norm.tflite",
            fuse_quant_dequant=True,
            quantize_target_type="int8"
        )
        converter.convert()

if __name__ == '__main__':
    _main()

Is there a new flag or something I should set to make this work?

Apr 23 '25 12:04 spacycoder

Yes, looks like we will need to ignore this line during model conversion. https://github.com/alibaba/TinyNeuralNetwork/blob/main/tinynn/graph/quantization/modules.py#L124C32-L124C42

Apr 23 '25 13:04 peterjc123