model-optimization
model-optimization copied to clipboard
Weight in fully connected layers don't follow tensorflow quantization spec (zero-point!=0)
1. System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04.5 LTS
- TensorFlow installation (pip package or built from source): pip
- TensorFlow library (version, if pip package or github SHA, if built from source): tensorflow2.5.0
2. Code
Provide code to help us reproduce your issues using one of the following options:
- Demonstrate how to build your TF model: I download the quantize-aware training int8 model from repo goolge-research/mobilebert. The model download link is download link.
- Please follow this colab page to convert the model.
-
QAT INT8 mobilebert tensorflow model: download link. Untar the file and then you can find model is in "mobilebert_squad_savedmodels/quant_saved_model".
-
Converted INT8 tflite model: download link
3. Failure after conversion
-
Model produces wrong results: FC layers zero-point != 0. These don't follow quantization spec.
-
Fail to convert the model to tflite, only tf-2.5.0 can successfully convert to INT8 tflite model. In other words, tf2.6 cannot work.
@thaink @teijeong @daverim for the visibility.
The weight input for FC op is not a weight. That's why we don't use symmetric. I think this is a corner case that TF EinsumDense convert to some TFLite FC ops because TFLite doesn't have matmul op. but it seems violate quantization spec. @teijeong Do you have any idea why it starts not working on TF 2.6+?
@Xhark Thanks for your response! @teijeong Is there any update for this issue?