model-optimization quantizing to int values

Hello, I have used your QAT model to quantize to different bitwidths, but I saw that the quantizations were always to FP values, even if they were quantized (e.g., if I quantized to 4bit, then all my weight were quantized to 16 discrete values, but they were not integers but rather FP values, namely non-integers.

I was wondering if there is a way to perform the QAT with a quantization-technique that quantizes to integers, so at to it would be more hardware-efficient. Thank you.

Sep 30 '21 13:09 lovodkin93

Currently, TF does not have the required kernels to run all quantized operations. So we are emulating quantization with float. In order to run this our users are generally converting to TFLite.

However, if you have custom TF kernels you would like to use on your hardware, you are welcome to create the scheme and registry required. We are working on opening up the API to allow custom quantization registry.

Dec 02 '21 09:12 daverim

Can any help me on how to create that registry ?

Feb 26 '24 16:02 dhruven-god