daverim
daverim
you need to move quantize_model into the call or the init ``` self.my_model = quantize_model(model) ``` On Wed, Apr 27, 2022 at 9:45 AM Gwena Cunha ***@***.***> wrote: > @daverim...
@rino20, can you take over as sponsor?
> Sorry for all the comments, no it's not ready for community review yet.
Also, i'm not sure what `quantize_model` is doing, but basically, there are some tensors that are outside the function def that you want to save. The code I presented encapsulates...
Assuming you have implemented the default 8 bit scheme ``` for layer in keras_model.layers: if hasattr(layer, 'quantize_config'): for weight, quantizer, quantizer_vars in layer._weight_vars: quantized_and_dequantized = quantizer(weight, training=false, weights=quantizer_vars) min_var =...
The documentation is basically the code -- in our case we use fake_quant_with_min_max_args https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/fake_quant_ops_functor.h we currently default to an 8bit scheme that matches tflite https://www.tensorflow.org/lite/performance/quantization_spec I actually made a mistake...
That is right, the stored min max will be incorrect if collected before folding, but used after folding batch norms. It is probably simplest to just get the values after...
No there is no folding emulation during training any more -- this is deprecated code but still useful in understanding the calculation. This is now calculated during batchnorm folding in...
Sorry, edited the last comment to point to the current release tag. As I mentioned, that folding is no longer done in TF. However, the calculation is essentially the same...
@xhark, could you update with the latest developments?