TinyNeuralNetwork
TinyNeuralNetwork copied to clipboard
LayerNorm conversion error
Hi, In the latest version of TinyNeuralNetwork layer norm causes the conversion to fail.
error output:
Error in QNNPACK: failed to create add operator with 8.124962e-06 A-to-output scale ratio: scale ratio must be in [2**-14, 2**8) range
...
File "../TinyNeuralNetwork/tinynn/graph/quantization/modules.py", line 136, in forward
return self.f_add_2.add(norm_alpha, bias_fq_expand)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "../torch/ao/nn/quantized/modules/functional_modules.py", line 241, in add
r = ops.quantized.add(x, y, scale=self.scale, zero_point=self.zero_point)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "../torch/_ops.py", line 1116, in __call__
return self._op(*args, **(kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: createStatus == pytorch_qnnp_status_success INTERNAL ASSERT FAILED at "../aten/src/ATen/native/quantized/cpu/BinaryOps.cpp":204, please report a bug to PyTorch. failed to create QNNPACK Add operator
torch version: 2.5.1 python version: 3.12
This should reproduce it:
import torch.nn as nn
import torch
from tinynn.graph.quantization.quantizer import PostQuantizer
from tinynn.converter import TFLiteConverter
from tinynn.graph.tracer import model_tracer
class LayerNormModel(nn.Module):
def __init__(self,):
super().__init__()
self.layer_norm = torch.nn.LayerNorm(256)
def forward(self, x: torch.Tensor):
return self.layer_norm(x)
def _main():
dummy_input = torch.rand(1, 60, 256).float()
model = LayerNormModel()
qat_config = {
"backend": "qnnpack",
"per_tensor": True,
"disable_requantization_for_cat": True,
}
with model_tracer():
quantizer = PostQuantizer(
model, (dummy_input), work_dir="LayerNormModel", config=qat_config
)
layer_norm_model = quantizer.quantize()
layer_norm_model(dummy_input)
with torch.no_grad():
layer_norm_model.eval()
layer_norm_model.cpu()
layer_norm_model = quantizer.convert(layer_norm_model)
torch.backends.quantized.engine = quantizer.backend
converter = TFLiteConverter(
layer_norm_model,
(dummy_input),
"layer_norm.tflite",
fuse_quant_dequant=True,
quantize_target_type="int8"
)
converter.convert()
if __name__ == '__main__':
_main()
Is there a new flag or something I should set to make this work?
Yes, looks like we will need to ignore this line during model conversion. https://github.com/alibaba/TinyNeuralNetwork/blob/main/tinynn/graph/quantization/modules.py#L124C32-L124C42