mobile_models MobileBERT tflite int8 model seems not follow quantization spec

MobileBERT tflite int8 model seems not follow quantization spec

Open rednoah91 opened this issue 3 years ago • 6 comments

The model downloaded from https://github.com/fatihcakirs/mobile_models/blob/main/v0_7/tflite/mobilebert_int8_384_20200602.tflite

Some Fully-connected weights has none-zero zero point (ex. weight bert/encoder/layer_0/attention/self/MatMul19 has zero-point = 6) , which violate the TFLite quantization spec.

I am afraid this might cause issues on some implementation which bypass the FC weight zero-point calculation.

Jun 09 '21 08:06 rednoah91

@rednoah91 Why are you using https://github.com/fatihcakirs/mobile_models vs https://github.com/mlcommons/mobile_models?

Jun 21 '21 23:06 mcharleb

@mcharleb The mobileBERT model in https://github.com/mlcommons/mobile_models points to https://github.com/fatihcakirs/mobile_models. They are the same.

Jun 23 '21 06:06 rednoah91

The model downloaded from https://github.com/fatihcakirs/mobile_models/blob/main/v0_7/tflite/mobilebert_int8_384_20200602.tflite

Some Fully-connected weights has none-zero zero point (ex. weight bert/encoder/layer_0/attention/self/MatMul19 has zero-point = 6) , which violate the TFLite quantization spec.

I am afraid this might cause issues on some implementation which bypass the FC weight zero-point calculation.

This model was provided by Google as QAT model, approved for use by the mobile working group

Sep 29 '21 05:09 Mostelk

@jwookiehong @rnaidu02 , can you bring this up in the mobile group discussion? I think the group needs to bless this (or fix this)

Oct 26 '21 22:10 willc2010

@freedomtan , can you help with the question on the TFlite quant spec?

Oct 27 '21 23:10 willc2010

As @Mostelk mentioned, this one is a Quantization-Aware Training (QAT) quantized model provided by Google colleagues. The quantization spec mentioned by @rednoah91 is mainly for Post-Training Quantization (PTQ).

Oct 28 '21 03:10 freedomtan

mobile_models mobile_models copied to clipboard

MobileBERT tflite int8 model seems not follow quantization spec

mobile_models
mobile_models copied to clipboard