WAGE.pytorch
WAGE.pytorch copied to clipboard
Confusions on quantization of activations.
Hey, @stevenygd . I recently checked your code base and the paper.
- One point i do not understand is the shifting operation in activations seems missed in this code implementation. Only gradients quantization performs the shifting to lift the scale of gradients which are typically small.
However, for activations, which involve many MAC operations, the shifting in my understanding is also required to avoid exploding of activation outputs, as stated in the paper.
-
The activation output seems to be clipped in the range [-1,1], but i do not see any normalization operation to scale the inputs. Outputs from convolutions and fc layers are possible to exceed this range. Clipping in this way seems to incur accuracy loss. Do i misunderstand here?
-
BTW, in this line, images inputs are rescaled to the range [-1,1]. Does this step essential to the training?
Could you please help answer above questions? Thanks a lot.