model_optimization
model_optimization copied to clipboard
Will MCT support int8 quantization?
Hello I am trying out MCT using int8 quantization. But i couldn't find any provision in the repo for int8 quantization. Kindly comment if any one has tried this.
Thanks
Hi,
MCT supports various quantization schemes, including int8, defined by the target platform capabilities (TPC). For more details about TPC, please see https://sony.github.io/model_optimization/api/api_docs/modules/target_platform.html#ug-target-platform.
In addition, we have implemented several target platform capabilities such as:
- TFLIte
- QnnPack
- Our Default
Also, there are tutorials on how to quantize for int8 using our default TPC in Keras(https://github.com/sony/model_optimization/blob/main/tutorials/example_keras_mobilenet.py) and PyTorch(https://github.com/sony/model_optimization/blob/main/tutorials/example_pytorch_mobilenet_v2.py).
BTW, it is worth pointing out that the output of MCT products fakely quantized model, meaning that the weights are quantized but kept in float32 data type and the activations are quantized using FakeQuantization layers.
Can you please elaborate on what you mean by "provision", or does this answers our question?
Hello, Thanks for your comments. I am trying to quantize a model using MCT. After i quantize, the resulting model is in float 32 data type. I would want the model in int8 data type so that the model size is reduced. Can i do it using MCT is my question.
It would be great & helpful if you could elaborate on the MCT fake quantization.
Hi @Mukulareddy, Thank you for the question. What you're looking for is currently not supported in our MCT, but we will take your feedback as feature request and will address it hopefully in future updates.
Regarding the fake quantization - MCT stores the quantized parameters after they've been quantized using a method similar to the one presented in the following link: tf/quantization/fake_quant_with_min_max_vars_per_channel. That is, the parameters are quantized and stored in float32 data type.
If I understand correctly, you are referring to something more similar to this technique: tf/quantization/quantize,
where a specific dtype can be provided through the parameter T
.
As I mentioned, this type of quantization is currently not supported in MCT, but we will consider this for future updates.
Stale issue message
Stale issue message
Stale issue message
Hi @Mukulareddy , A new method for exporting TFLite int8 models from MCT has recently been added and will be available in the upcoming release. Please keep in mind that this is an experimental feature and is subject to future changes.
You can find more information and usage example here. If you have any questions or issues, please let us know.