tf-approximate icon indicating copy to clipboard operation
tf-approximate copied to clipboard

Could the signed 8*8 multiplier work?

Open johnzhou1996 opened this issue 4 years ago • 1 comments

Hi, my name is John. Thank for your opening source. In your code, the quantization of weight and activation are both UINT8[0, 255] , so unsigned 88 multipilers are used. If I would like to quantize the weight and activation to [-127, 127], using the signed 88 multipliers, how to adjust the code could achieve the aim?Thanks again.

johnzhou1996 avatar Jul 09 '20 03:07 johnzhou1996

Hi John, The approach is precisely the same, as I mentioned in issue #5. The quantization works as follows: the minimal value is referenced as 0, and the maximal is referenced as 255. And you can simply shift this interval because the data are stored in the bin file sequentially.

But then you will have to shift the result from -32768 to 32768 to interval 0 to 65536.

The only issue of this type of quantization is that 0.0 (float) is not typically expressed as 0.

FILE * f = fopen("output.bin", "wb");

for(unsigned int a = -128; a < 128; a++)
    for(unsigned int b = -128; b < 128; b++) {
      int16_t val = approximate_mult(a, b); // replace by your own function call
      uint16_t val_u = val + 32768;
      fwrite(&val_u, sizeof(uint16_t), 1, f);
    }

fclose(f);

mrazekv avatar Jul 10 '20 11:07 mrazekv