model-optimization
model-optimization copied to clipboard
tf.quantization.quantize fails when converting to TFLite
1. System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 20.04
- TensorFlow installation (pip package or built from source): from pip package tf-nightly
- TensorFlow library (version, if pip package or github SHA, if built from source): 2.7.0-dev20210922
2. Code
I'm exporting a TFLite model with multiple signatures provided as concrete functions. In one of them I want to perform a manual quantization using tf.quantization.quantize
, since as far as I know the quantize operation exists in TFLite.
import tensorflow as tf
class TestModel(tf.keras.models.Model):
@tf.function
def quantize(self, value):
# Range values are just an example for repro purposes.
return tf.quantization.quantize(value, -1.0, 1.0, tf.qint8)
test_model = TestModel()
test_model.quantize(tf.random.uniform([10], -1.0, 1.0)) # Works fine.
signatures = [test_model.quantize.get_concrete_function(tf.TensorSpec([None, 10], tf.float32))]
converter = tf.lite.TFLiteConverter.from_concrete_functions(signatures, test_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()
However, convert fails with the following error.
error: Failed to convert element type '!tf_type.qint8': Unsupported type
<unknown>:0: note: loc("StatefulPartitionedCall_2"): called from
<unknown>:0: error: invalid TFLite type: 'tensor<?x?x32x!tf_type.qint8>'
If instead I try to use tf.int8, then I get this other error.
TypeError: Value passed to parameter 'T' has DataType int8 not in list of allowed values: qint8, quint8, qint32, qint16, quint16
Since the operation actually exists in TFLite, could this be just a problem managing the output dtype argument?
It should be noted that I am fully aware of the inference_input_type
and inference_output_type
attributes in the converter. These are not what I'm asking about. I'm asking about explicitly running the quantize op within one of the model signatures.
Hi @leandro-gracia-gil ,
As you observed, tf.quantization.quantize
is not converted to TFLite op. You can try fake_quant_with_min_max_args to actually quantize and dequantize the value, but I can't recall a op that is converted to TFLite quantize op. Note that TFLite's quantize op quantizes uniformly with scale and zero point, while tf.quantize takes min and max values.
Can you elaborate on what you're trying to do, so that I can guide you to a more appropriate approach? Thanks.
Hi @teijeong,
I'm trying to export a TFLite model with 2 signatures, now that multiple of them are supported in tf-nightly. One signature uses integer quantization and takes quantized inputs. The other is not quantized, but needs to produce quantized outputs that the first signature can consume. These quantized outputs would also be saved separately, so making the quantized signature take floats or somehow merging the 2 signatures in one are not viable options.
In theory I can retrieve the scale and zero point of the quantized inputs from the TFLite model itself, and from there I can compute min/max ranges. But I still need a way to manually quantize values, so I was trying with tf.quantization.quantize
. Any suggestions on better ways to do this are most welcome.