quantization not happening?
Hello, I have tried to make my model comaptible with QAT, according to your guideline. I started with defining a QuantizeConfig class:
LastValueQuantizer = tfmot.quantization.keras.quantizers.LastValueQuantizer
MovingAverageQuantizer = tfmot.quantization.keras.quantizers.MovingAverageQuantizer
class DefaultConv2DQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
# Configure how to quantize weights.
def get_weights_and_quantizers(self, layer):
return [(layer.kernel, LastValueQuantizer(num_bits=4, symmetric=True, narrow_range=False, per_axis=False))]
# Skip quantizing activations.
def get_activations_and_quantizers(self, layer):
return []
def set_quantize_weights(self, layer, quantize_weights):
# Add this line for each item returned in `get_weights_and_quantizers`
# , in the same order
layer.kernel = quantize_weights[0]
def set_quantize_activations(self, layer, quantize_activations):
# Empty since `get_activaations_and_quantizers` returns
# an empty list.
return
# Configure how to quantize outputs (may be equivalent to activations).
def get_output_quantizers(self, layer):
return [MovingAverageQuantizer(num_bits=4, symmetric=False, narrow_range=False, per_axis=False)]
def get_config(self):
return {}
Then I applied the quantization:
def apply_mix_precision_QAT2(layer):
# if isinstance(layer, tf.keras.layers.Dense):
if isinstance(layer, tf.keras.layers.Conv2D):
return tfmot.quantization.keras.quantize_annotate_layer(layer, quantize_config=DefaultConv2DQuantizeConfig())
return layer
annotated_model = tf.keras.models.clone_model(model,clone_function=apply_mix_precision_QAT2)
with tfmot.quantization.keras.quantize_scope({'DefaultConv2DQuantizeConfig': DefaultConv2DQuantizeConfig}):
model = tfmot.quantization.keras.quantize_apply(annotated_model)
Finally, I chose one of the Conv2D layers, which evidently was quantized:

and looked at its weights, and it appears they have not been quantized to a 4bit encoding, like I specified in the QuantizeConfig:

Do I have an error? Or is there another way to check whether the layer was quantized? Thanks!
QAT model is only for training. So the weights for QAT model is a float form. For 8 bit quantization, we do actual quantization during TFLite conversion. (inference also just simulate quantized model, but it's not actual quantized. we use float32 ops with fake-quant to simulate.)
You can get fake quantized weights manually, but it still dequantized float32. (from fake-quant)
unquantized_weight, quantizer, quantizer_vars = q_aware_model.layers[2]._weight_vars[0] print(quantizer(unquantized_weight, training=False, weights=quantizer_vars))
@Xhark Ok, I think I understand. So I have a followup question - is the validation done using the quantized version of the model (with the fake-quant weights), or is it done with the unquantized weights?
@Xhark
So I tried what you suggested, and it appears the quantizer parameters (the second one in q_aware_model.layers[2]._weight_vars[0]) is None. This is weird, given the model is wrapped in the Quantizer Wrapper, and it does have min_var and max_var values.
Do you happen to know what might cause the quantizer parameter to be None?
@Xhark So after checking, it appears that when trying to access the quantizer after reloading the model's parameters, namely:
checkpoint_path="/home/taaviv/dl-quantization/post_train_quant/best_checkpoint/cp.ckpt"
model = tf.keras.models.load_model(checkpoint_path)
unquantized_weight, quantizer, quantizer_vars = model.layers[2]._weight_vars[0]
the quantizer is None, in contrast to accessing the trained model prior to saving it (namely, not going through the save and reload phases, but accessing the _weights_vars[0] paramater immediately after training), namely, when doing:
model = resnet50()
model .compile(...)
model.fit(...)
unquantized_weight, quantizer, quantizer_vars = model.layers[2]._weight_vars[0]
quantizer seems to be of type Quantizer, and not None.
Do you happen to know what might cause this behaviour?