model-optimization icon indicating copy to clipboard operation
model-optimization copied to clipboard

quantizing to int values

Open lovodkin93 opened this issue 4 years ago • 2 comments

Hello, I have used your QAT model to quantize to different bitwidths, but I saw that the quantizations were always to FP values, even if they were quantized (e.g., if I quantized to 4bit, then all my weight were quantized to 16 discrete values, but they were not integers but rather FP values, namely non-integers.

I was wondering if there is a way to perform the QAT with a quantization-technique that quantizes to integers, so at to it would be more hardware-efficient. Thank you.

lovodkin93 avatar Sep 30 '21 13:09 lovodkin93

Currently, TF does not have the required kernels to run all quantized operations. So we are emulating quantization with float. In order to run this our users are generally converting to TFLite.

However, if you have custom TF kernels you would like to use on your hardware, you are welcome to create the scheme and registry required. We are working on opening up the API to allow custom quantization registry.

daverim avatar Dec 02 '21 09:12 daverim

Can any help me on how to create that registry ?

dhruven-god avatar Feb 26 '24 16:02 dhruven-god