model-optimization tf.quantization.quantize fails when converting to TFLite

1. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04
TensorFlow installation (pip package or built from source): from pip package tf-nightly
TensorFlow library (version, if pip package or github SHA, if built from source): 2.7.0-dev20210922

2. Code

I'm exporting a TFLite model with multiple signatures provided as concrete functions. In one of them I want to perform a manual quantization using tf.quantization.quantize, since as far as I know the quantize operation exists in TFLite.

import tensorflow as tf

class TestModel(tf.keras.models.Model):
  @tf.function
  def quantize(self, value):
    # Range values are just an example for repro purposes.
    return tf.quantization.quantize(value, -1.0, 1.0, tf.qint8)


test_model = TestModel()
test_model.quantize(tf.random.uniform([10], -1.0, 1.0)) # Works fine.
signatures = [test_model.quantize.get_concrete_function(tf.TensorSpec([None, 10], tf.float32))]

converter = tf.lite.TFLiteConverter.from_concrete_functions(signatures, test_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

However, convert fails with the following error.

error: Failed to convert element type '!tf_type.qint8': Unsupported type
<unknown>:0: note: loc("StatefulPartitionedCall_2"): called from
<unknown>:0: error: invalid TFLite type: 'tensor<?x?x32x!tf_type.qint8>'

If instead I try to use tf.int8, then I get this other error.

TypeError: Value passed to parameter 'T' has DataType int8 not in list of allowed values: qint8, quint8, qint32, qint16, quint16

Since the operation actually exists in TFLite, could this be just a problem managing the output dtype argument?

It should be noted that I am fully aware of the inference_input_type and inference_output_type attributes in the converter. These are not what I'm asking about. I'm asking about explicitly running the quantize op within one of the model signatures.

Sep 25 '21 17:09 leandro-gracia-gil

Hi @leandro-gracia-gil ,

As you observed, tf.quantization.quantize is not converted to TFLite op. You can try fake_quant_with_min_max_args to actually quantize and dequantize the value, but I can't recall a op that is converted to TFLite quantize op. Note that TFLite's quantize op quantizes uniformly with scale and zero point, while tf.quantize takes min and max values.

Can you elaborate on what you're trying to do, so that I can guide you to a more appropriate approach? Thanks.

Sep 27 '21 07:09 teijeong

Hi @teijeong,

I'm trying to export a TFLite model with 2 signatures, now that multiple of them are supported in tf-nightly. One signature uses integer quantization and takes quantized inputs. The other is not quantized, but needs to produce quantized outputs that the first signature can consume. These quantized outputs would also be saved separately, so making the quantized signature take floats or somehow merging the 2 signatures in one are not viable options.

In theory I can retrieve the scale and zero point of the quantized inputs from the TFLite model itself, and from there I can compute min/max ranges. But I still need a way to manually quantize values, so I was trying with tf.quantization.quantize. Any suggestions on better ways to do this are most welcome.

Sep 27 '21 08:09 leandro-gracia-gil

model-optimization model-optimization copied to clipboard

tf.quantization.quantize fails when converting to TFLite

1. System information

2. Code

model-optimization
model-optimization copied to clipboard