TensorRT
TensorRT copied to clipboard
Using tensorflow quantisation toolkit, how to export to tflite correctly?
I am trying to do a QAT training with the simple mobilenetv3-like model. The training goes well, but when I save the model to the Keras and then convert it to the tflite, quantization/dequantization nodes are here and they are slowing the inference on TPU quite a lot. What I need is full integer quantization.
Is there any simple way ( if this is possible in principle ) to unwrap the model after the training? If I get it correctly Q/DQ nodes are simply just adding noise to the training so the network is learning to overcome this. If I omit A/QD nodes, the network then should correctly convert. Am I thinking about it correctly or completely mistaken?
@ttyio ^ ^
@gcunhase ^ ^
Hi @batrlatom,
This toolkit is for functional/sequential TF Keras models, so we did not test converting the quantized Keras model to tflite and cannot predict what issues this could cause.
In any case, it seems that Q/DQ nodes are being added to your Keras model, so the toolkit seems to be functioning appropriately. If you do not wish for the model to have Q/DQ nodes, then you'll probably need to use a different toolkit, as the goal of this toolkit is to explicitly quantize models so they can be converted to TensorRT engines.
After a quick search, I could find the following TFLite quantization tutorials, as it seems that your objective is to work in TFLite:
- https://www.tensorflow.org/lite/performance/post_training_quantization
- https://www.tensorflow.org/lite/performance/post_training_integer_quant
Hope this helps.
Closing since this is solved, thanks all!