model-optimization Full Int8 QAT not working

Just a quick question. I want my final model to be full int8 instead of float32 for input and outputs. I want the training to be as accurate as possible. Do I train with quantised input and outputs? Because I have followed the common procedure in the comprehensive guide (with my custom model) and it hasn't worked. So

I trained using the comprehensive guide but modified it to my model
After training I use these settings to quantise my model

converter.experimental_new_converter = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8

When I go to evaluate the model it is completely inaccurate

What do I need to do to allow for full int8 to work?

All help welcome

May 25 '22 18:05 MATTYGILO

Hi Matty, Passing converter.representative_dataset = representative_dataset is only required for post-training quantization. If you want to use QAT, follow the guide at https://www.tensorflow.org/model_optimization/guide/quantization/training_example( use quantize_model before training and train it with the non-quantized input as usual, and then convert it to TFLite).

May 27 '22 01:05 thaink

@thaink I have followed the guides. However I'm using tflite micro which requires full int 8. In none of the examples does it show what to do for full int 8 for input and output. Even if you QAT you still have to convert it using post training quantization and there are no examples of int8 inputs and outputs for QAT.

May 27 '22 10:05 MATTYGILO

The inference_input_type and inference_output_type is to use int8 input and output actually.

May 30 '22 00:05 thaink

@thaink I've already set those values. Are you suggesting I train on quantised data?

May 30 '22 07:05 MATTYGILO

Can you share or describe what your output model looks like?

May 31 '22 00:05 thaink

@thaink Its a yamnet, I followed this medium post https://medium.com/@antonyharfield/converting-the-yamnet-audio-detection-model-for-tensorflow-lite-inference-43d049bd357c

May 31 '22 11:05 MATTYGILO

@thaink I've converted the model with full int 8 but the output of the model is complete rubbish. So I did QAT, of which I have converted to full int 8 but the output is complete rubbish.

May 31 '22 11:05 MATTYGILO

@thaink What is the suggest way of doing full int8 QAT on a model

May 31 '22 11:05 MATTYGILO

@thaink This is how I QAT.

import tensorflow_model_optimization as tfmot

LastValueQuantizer = tfmot.quantization.keras.quantizers.LastValueQuantizer
MovingAverageQuantizer = tfmot.quantization.keras.quantizers.MovingAverageQuantizer

class DefaultDenseQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
    # List all of your weights
    weights = {
        "kernel": LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False)
    }

    # List of all your activations
    activations = {
        "activation": MovingAverageQuantizer(num_bits=8, symmetric=False, narrow_range=False, per_axis=False)
    }

    # Configure how to quantize weights.
    def get_weights_and_quantizers(self, layer):
        output = []
        for attribute, quantizer in self.weights.items():
            if hasattr(layer, attribute):
                output.append((getattr(layer, attribute), quantizer))

        return output

    # Configure how to quantize activations.
    def get_activations_and_quantizers(self, layer):
        output = []
        for attribute, quantizer in self.activations.items():
            if hasattr(layer, attribute):
                output.append((getattr(layer, attribute), quantizer))

        return output

    def set_quantize_weights(self, layer, quantize_weights):
        # Add this line for each item returned in `get_weights_and_quantizers`
        # , in the same order

        count = 0
        for attribute in self.weights.keys():
            if hasattr(layer, attribute):
                setattr(layer, attribute, quantize_weights[count])
                count += 1

    def set_quantize_activations(self, layer, quantize_activations):
        # Add this line for each item returned in `get_activations_and_quantizers`
        # , in the same order.
        count = 0
        for attribute in self.activations.keys():
            if hasattr(layer, attribute):
                setattr(layer, attribute, quantize_activations[count])
                count += 1

    # Configure how to quantize outputs (may be equivalent to activations).
    def get_output_quantizers(self, layer):
        return []

    def get_config(self):
        return {}

from quant import DefaultDenseQuantizeConfig
from tensorflow_model_optimization.python.core.quantization.keras.quantize import quantize_scope, quantize_apply
import tensorflow_model_optimization as tfmot


with quantize_scope({
    "DefaultDenseQuantizeConfig": DefaultDenseQuantizeConfig,
    "CustomLayer": CustomLayer
}):
    def apply_quantization_to_layer(layer):
        return tfmot.quantization.keras.quantize_annotate_layer(layer, DefaultDenseQuantizeConfig())

    annotated_model = tf.keras.models.clone_model(
        tflite_model,
        clone_function=apply_quantization_to_layer,
    )

    qat_model = tfmot.quantization.keras.quantize_apply(annotated_model)

    qat_model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
        loss="categorical_crossentropy",
        metrics=['accuracy']
    )

    qat_model.summary()

Please I need all help and advice

May 31 '22 11:05 MATTYGILO

@Xhark Could you check if the Matt is QAT-ing the right way?

Jun 02 '22 02:06 thaink

Hi, @MATTYGILO, I am experiencing the same problem. The full int8 QAT-derived Tensorflow-lite model (using reference data to set input and output to Int8) doesn't seem to work. I am losing a lot of accuracy after the model conversion. I was wondering if you found a solution for this Full int8 QAT model conversion. Thank you!

Mar 06 '23 14:03 haozh7109

Thank you very much for your help. I am facing the same issue with mobilenetV3 (both with PTQ and QAT), any ideas on why this might be the case? Thank you. @thaink

Jul 01 '23 14:07 Alexey234432

Hi, I am facing the same issue as well for QAT with MobileNetV3 (accuracy for QAT TFLite model is much lower than the corresponding QAT Keras Model). Is there any fix for this yet?

Jan 14 '24 16:01 tarushbansal

model-optimization model-optimization copied to clipboard

Full Int8 QAT not working

model-optimization
model-optimization copied to clipboard