aimet result from aimet evaluation and result after quantization on 8295 doesn't match

result from aimet evaluation and result after quantization on 8295 doesn't match

Open superpigforever opened this issue 9 months ago • 2 comments

Hi: I tried QAT on a model and exported the encodings. Then, I used the qnn-onnx-converter with --quantization_overrides and --input_list trying to put min/max/scale value after QAT into the converted model. However, even though the evaluation of aimet model is very good, the result I got from 8295 infer is not that good. I'm not sure what is wrong.

By the way, in the json file generated by qnn-onnx-converter, there is no batchnorm even though there is batchnorm in the encoding file. The command I used is qnn-onnx-converter --input_network xxx.onnx --quantization_onverrides xxx.encodings --input_list xxx.txt

May 08 '24 06:05 superpigforever

Hello @superpigforever, batchnorms are optimized out during conversion by folding the encoding values into the preceding conv2d (including depthwise and transpose variants) or fully connected layers. As such, the missing batchnorm operation is expected.

I would recommend a layer wise comparison between the fp32 model and the QNN quantized model. That could help narrow down the source of the regression.

May 13 '24 23:05 quic-akinlawo

Hi @superpigforever,

There are two points I would recommend checking:

1/ BN folding during QAT (using the method fold_all_batch_norms) => this is recommended to ensure consistency between QAT and hardware inference. 2/ Ensure that the encodings in the cpp file generated by qnn-onnx-converter contains the encodings coming from aimet QAT.

May 14 '24 14:05 e-said

aimet aimet copied to clipboard

result from aimet evaluation and result after quantization on 8295 doesn't match

aimet
aimet copied to clipboard