model-optimization Weight in fully connected layers don't follow tensorflow quantization spec (zero-point!=0)

1. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04.5 LTS
TensorFlow installation (pip package or built from source): pip
TensorFlow library (version, if pip package or github SHA, if built from source): tensorflow2.5.0

2. Code

Provide code to help us reproduce your issues using one of the following options:

Demonstrate how to build your TF model: I download the quantize-aware training int8 model from repo goolge-research/mobilebert. The model download link is download link.
Please follow this colab page to convert the model.

QAT INT8 mobilebert tensorflow model: download link. Untar the file and then you can find model is in "mobilebert_squad_savedmodels/quant_saved_model".
Converted INT8 tflite model: download link

3. Failure after conversion

Model produces wrong results: FC layers zero-point != 0. These don't follow quantization spec.
Fail to convert the model to tflite, only tf-2.5.0 can successfully convert to INT8 tflite model. In other words, tf2.6 cannot work.

Sep 06 '21 10:09 bhbruce

@thaink @teijeong @daverim for the visibility.

Sep 06 '21 10:09 abattery

The weight input for FC op is not a weight. That's why we don't use symmetric. I think this is a corner case that TF EinsumDense convert to some TFLite FC ops because TFLite doesn't have matmul op. but it seems violate quantization spec. @teijeong Do you have any idea why it starts not working on TF 2.6+?

Sep 08 '21 19:09 Xhark

@Xhark Thanks for your response! @teijeong Is there any update for this issue?

Oct 01 '21 09:10 bhbruce