model_optimization icon indicating copy to clipboard operation
model_optimization copied to clipboard

Will MCT support int8 quantization?

Open Mukulareddy opened this issue 2 years ago • 4 comments

Hello I am trying out MCT using int8 quantization. But i couldn't find any provision in the repo for int8 quantization. Kindly comment if any one has tried this.

Thanks

Mukulareddy avatar Jun 16 '22 10:06 Mukulareddy

Hi,

MCT supports various quantization schemes, including int8, defined by the target platform capabilities (TPC). For more details about TPC, please see https://sony.github.io/model_optimization/api/api_docs/modules/target_platform.html#ug-target-platform.

In addition, we have implemented several target platform capabilities such as:

  1. TFLIte
  2. QnnPack
  3. Our Default

Also, there are tutorials on how to quantize for int8 using our default TPC in Keras(https://github.com/sony/model_optimization/blob/main/tutorials/example_keras_mobilenet.py) and PyTorch(https://github.com/sony/model_optimization/blob/main/tutorials/example_pytorch_mobilenet_v2.py).

BTW, it is worth pointing out that the output of MCT products fakely quantized model, meaning that the weights are quantized but kept in float32 data type and the activations are quantized using FakeQuantization layers.

Can you please elaborate on what you mean by "provision", or does this answers our question?

haihabi avatar Jun 16 '22 13:06 haihabi

Hello, Thanks for your comments. I am trying to quantize a model using MCT. After i quantize, the resulting model is in float 32 data type. I would want the model in int8 data type so that the model size is reduced. Can i do it using MCT is my question.

It would be great & helpful if you could elaborate on the MCT fake quantization.

Mukulareddy avatar Jun 17 '22 04:06 Mukulareddy

Hi @Mukulareddy, Thank you for the question. What you're looking for is currently not supported in our MCT, but we will take your feedback as feature request and will address it hopefully in future updates.

Regarding the fake quantization - MCT stores the quantized parameters after they've been quantized using a method similar to the one presented in the following link: tf/quantization/fake_quant_with_min_max_vars_per_channel. That is, the parameters are quantized and stored in float32 data type.

If I understand correctly, you are referring to something more similar to this technique: tf/quantization/quantize, where a specific dtype can be provided through the parameter T. As I mentioned, this type of quantization is currently not supported in MCT, but we will consider this for future updates.

ofirgo avatar Jun 19 '22 07:06 ofirgo

Stale issue message

github-actions[bot] avatar Aug 18 '22 10:08 github-actions[bot]

Stale issue message

github-actions[bot] avatar Nov 19 '22 10:11 github-actions[bot]

Stale issue message

github-actions[bot] avatar Jan 30 '23 10:01 github-actions[bot]

Hi @Mukulareddy , A new method for exporting TFLite int8 models from MCT has recently been added and will be available in the upcoming release. Please keep in mind that this is an experimental feature and is subject to future changes.

You can find more information and usage example here. If you have any questions or issues, please let us know.

reuvenperetz avatar Feb 08 '23 13:02 reuvenperetz