QPyTorch Floatpoint(8,23)flips the input values

Floatpoint(8,23)flips the input values

Open ASHWIN2605 opened this issue 4 years ago • 4 comments

Hi,

I have tried the following code a=torch.tensor([3.0]) out=float_quantize(a,8,23,"nearest")

The output is printed as -3.0.

This happens only when the rounding is nearest .I am not able to understand why is this happening. Can you please explain me why is this happening, as I am missing something here.

Jul 08 '21 16:07 ASHWIN2605

what is printed out when you don't use nearest rounding?

Jul 09 '21 00:07 Tiiiger

When I use stochastic rounding, the same input number is printed.

Jul 09 '21 08:07 ASHWIN2605

hi @ASHWIN2605

Good catch, I think this is an edge case. I'll look into the code soon.

But 8bits exponent, 23 bits mantissa is the standard fp32 format anyways so I don't think you want to quantize it anyways.

Jul 13 '21 08:07 Tiiiger

Hello,

This is from round_bitwise function in quant_cpu.cpp. Specifically rand_prob = 1 << (23 - man_bits - 1); when man_bit = 23 then it becomes rand_prob = 1 << -1;

Oct 27 '21 08:10 wassimseif

QPyTorch QPyTorch copied to clipboard

Floatpoint(8,23)flips the input values

QPyTorch
QPyTorch copied to clipboard