ZeroQ
ZeroQ copied to clipboard
Why do the weights need to be dequantized after one quantization?
new_quant_x = linear_quantize(x, scale, zero_point, inplace=False)
n = 2**(k - 1)
new_quant_x = torch.clamp(new_quant_x, -n, n - 1)
quant_x = linear_dequantize(new_quant_x,
scale,
zero_point,
inplace=False)
Doesn't this get the weight of the floating point number?
From my point of view, most of the quantization papers, the code is using fake quantization operation to simulate quantization. So it's still using floating-point numbers for quantization.