quantized.pytorch
quantized.pytorch copied to clipboard
Straight through estimator
I noticed that you don't cancel gradient of the large values, when using straight through estimator here.
In QNN paper it was claimed "Not cancelling the gradient when r is too large significantly worsens performance".
Does it only matter for low precision quantization (e.g. binary?)