model-optimization Issue for the FAKE

Hi, For the example: a=tf.fake_quant_with_min_max_vars([-1.0,0.0,1.0], min=-1, max=1) sess=tf.Session() sess(a) The result is array([-0.9960785, 0. , 1.0039215], dtype=float32)

But according to the source code "fake_quant_ops_functor.h": it define the: *scale = (max - min) / (quant_max_float - quant_min_float); zero_point_from_min = quant_min_float - min / *scale; nudged_zero_point = StdRound(zero_point_from_min); *nudged_min = (quant_min_float - nudged_zero_point) * (*scale); *nudged_max = (quant_max_float - nudged_zero_point) * (*scale);

So, i think the result should be [-1.0039215, 0. , 0.9960785]. Please help me to resolve the confusion. Thanks

Oct 05 '21 07:10 jasonchenPJ

@Xhark could you take a look at this question?

Oct 12 '21 04:10 abattery

It looks has some rounding computational numerical issue. (e.g. zero_point_from_min=127.5, but sometimes zero_point_from_min=127.49999)

AFAIK, fake_quant op implementation has some numerical error:

[as-is] (clamped_shifted / nudged_scale_repl + 0.5f).floor() * nudged_scale_repl + nudged_min vs [better accurate] (clamped_shifted * inv_nudged_scale_repl - quant_zero + 0.5f).floor() * nudged_scale_repl

They are mathmatically same but due to computing numerical error, current implementation might have larger error. But this change can break backward compatibility, we are trying to find a way to reduce numerical error.

During this effort, Would you please change your model to be not sensitive to those computational numerical error if it possible?

Oct 12 '21 09:10 Xhark

Thanks for your reply. In current implementation, does it mean the zero_point_from_min will be 127.5 or 127.49999 at different desktop? It will directly affect the result of fake_quan.

Actually, I insert the FQ at the front of DNN during QAT stage. So the value range processing: 0 to 255 (org. image value) ==> -1 to 1 (after normalize) ==> -0.9960785 to 1.0039215(after FQ) --> DNN

I just confuse how to decide the zero_point value is 127 or 128 when inference stage. If zero_point=127, the value range is -127 to 128 (INT8 overflow) If zero_point=128, the value range is -128 to 127

Thanks~

Oct 13 '21 03:10 jasonchenPJ

Hi, this confusion is probably due to the fact that numpy/python uses round-toward-zero and c++/android uses round-away. In your example, you probably expect zp = 0, but it is actually -1.

If you follow the logic (as well as round away from zero) you will see that the fake quant output is "correct". The nudging logic is a little bit confusing, but historically it was added to keep the zero point an int8 value and to choose the larger half range. For truly symmetric ranges, this can lead to some quantization error.

As for your second question, at inference, the zero point value will never be outside the range of int8 (-128, 127) since the nudge requires the zero point to be <= int8_max and >= int8_min.

Hope that helps

Dec 02 '21 09:12 daverim

Issue for the FAKE_QUANT result