model_optimization uint8 quantization

uint8 quantization

Open zhangchen98 opened this issue 1 year ago • 2 comments

Issue Type

Others

Source

pip (model-compression-toolkit)

MCT Version

1.8.0

OS Platform and Distribution

Linux version 3.10.0-327.36.3.el7.x86_64 ([email protected])

Python version

3.7

Describe the issue

My model is trained with pytorch, suppose I want to use MCT's PTQ quantization method to quantize it to 8bit, and deploy the model on the edge device, how should I do it?

Thanks！

Expected behaviour

No response

Code to reproduce the issue

None

Log output

No response

Apr 20 '23 13:04 zhangchen98

Hello @1437539743, For now, MCT exports quantized models in a fakely-quantized manner (namely, the weights are quantized but have a float32 data type, and the activations are quantized using fake-quantized operations). However, we support int8 data typed in TFLite models. A usage example can be seen here. As for Pytorch models, uint8 data type may be supported in future releases. In the meantime, you can use the quantization information (number of bits, threshold, etc.) attached for each layer, using the flag new_experimental_exporter when calling pytorch_post_training_quantization_experimental.

Apr 27 '23 07:04 reuvenperetz

Stale issue message

Jul 10 '23 10:07 github-actions[bot]

model_optimization model_optimization copied to clipboard

uint8 quantization

Issue Type

Source

MCT Version

OS Platform and Distribution

Python version

Describe the issue

Expected behaviour

Code to reproduce the issue

Log output

model_optimization
model_optimization copied to clipboard