aimet
aimet copied to clipboard
Weight quantization using min-max calibration
Hi all,
I was trying to do symmetric weight-only quantization using min-max calibration, but while computing scaling factors and zero-points using the min-max calibration method, when my floating point weight range does not include floating point zero within the range, in order to map the floating point zero to symmetric zero, we normalize the floating values in between range [-1,1] and then map these floating point values to [-127,127]. My questions are:
-
If we normalize every layer to [-1,1], the scaling factors for every layer would be (scale_factor=(max-min)/(qmax-qmin)=(1-(-1)/127--127))=2/255) which is same for all the layers since I normalize all the layers to [-1,1] and then quantize them by dividing the floating point value with scaling factor. Can we do this way since scaling factors all layers will be the same?
-
Is there any correct way of normalizing the weights before quantization to include the floating zero within the range of floating point values?
-
If I do not want to do normalization, the only thing I can do is shift the floating point range from [-max, max] to get symmetricity. For example, if my floating point weights are in the range of [10,15], I need to shift the range to [-15,15] and then map to [-127,127], but in this case, there is a wastage in the quantization range which degrades the accuracy. So, how to overcome this issue?
Hi @sandeep1404
I'm not sure if I have understood your first question correctly.
Assuming w/ Min-Max based calibration scheme is symmetric, then scale = absmax(x) / (2^(bitwdith - 1) - 1)
and we get xint = round(x/scale)
. On the other hand, let's say you first normalize to [-1, 1], basically it means x_norm = x / absmax(x)
and mapping it to signed integer grid [-127, 127] is xint = round(x_norm * 2^(bitwidth - 1) - 1)
Now, both leads to the same xint = round(x / absmax(x) * 2^(bitwidth - 1) - 1))
, right?
AIMET supports mainly two quantization calibration schemes to select quantization range (qmin-qmax).
-
Min-Max based: Also referred to as "
TF
" in AIMET covers entire dynamic range of tensor and uses true min/max values of tensor being quantized. -
SQNR based: Also referred to as "
TF Enhanced
" in AIMET where if findsqmin
andqmax
that minimizes the Mean Square Error between original tensor and quantized tensor to alleviate the issue of outliers. It is beneficial when the tensor has "long tails". In such cases, long tails can be ignored and most effective range is used instead of the entire range.
Again, coming back to symmetric quantization mode, you can either use signed vs unsigned integer grids depending on FP32 tensor values. For the tensor distributions roughly centered around zero, signed symmetric quantization can be used to map to quantized grid [-127, 127]. On the other hand, if your tensor has all the positive values [10, 15], then unsigned symmetric quantization may be well suited [0, 255].
Hope this helps. Please let us know if you have further questions.