TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

Using tensorflow quantisation toolkit, how to export to tflite correctly?

Open batrlatom opened this issue 1 year ago • 1 comments

I am trying to do a QAT training with the simple mobilenetv3-like model. The training goes well, but when I save the model to the Keras and then convert it to the tflite, quantization/dequantization nodes are here and they are slowing the inference on TPU quite a lot. What I need is full integer quantization.

Is there any simple way ( if this is possible in principle ) to unwrap the model after the training? If I get it correctly Q/DQ nodes are simply just adding noise to the training so the network is learning to overcome this. If I omit A/QD nodes, the network then should correctly convert. Am I thinking about it correctly or completely mistaken?

batrlatom avatar Jan 22 '24 14:01 batrlatom

@ttyio ^ ^

zerollzeng avatar Jan 24 '24 13:01 zerollzeng

@gcunhase ^ ^

ttyio avatar Mar 26 '24 17:03 ttyio

Hi @batrlatom,

This toolkit is for functional/sequential TF Keras models, so we did not test converting the quantized Keras model to tflite and cannot predict what issues this could cause.

In any case, it seems that Q/DQ nodes are being added to your Keras model, so the toolkit seems to be functioning appropriately. If you do not wish for the model to have Q/DQ nodes, then you'll probably need to use a different toolkit, as the goal of this toolkit is to explicitly quantize models so they can be converted to TensorRT engines.

After a quick search, I could find the following TFLite quantization tutorials, as it seems that your objective is to work in TFLite:

  • https://www.tensorflow.org/lite/performance/post_training_quantization
  • https://www.tensorflow.org/lite/performance/post_training_integer_quant

Hope this helps.

gcunhase avatar Mar 26 '24 17:03 gcunhase

Closing since this is solved, thanks all!

ttyio avatar Apr 30 '24 20:04 ttyio