Manu Mathew
Manu Mathew
Hi, As you can see in the method merge_quantize_weights(), the parameters are quantized. So both parameters and activations are quantized. But both effects parameter quantization and activation quantization are removed...
Interesting discussion. Lets continue. You said: we should correct the conv layer's w to wq I think it is clear that the weights used in forward in this code is...
Let me write it step by step and you tell me which step do you thing there should be a modification: 1. merge_quantize_weights will quantize the weights w to wq....
From the paper that you quoted, section 3.2: "However, **we maintain weights in floating point** and update them with the gradient updates. This ensures that minor gradient updates gradually update...
Okay. It is clearer now. So what you are saying is that backward computation should use wq,yq (but update w as per the paper). But instead what is happening is...
Interesting! Thanks for pointing this out - something to think about.
I have cleaned up the implementation of STE a bit for better understanding. If you have a float tensor "y" and a fake quantized tensor "yq", then to do STE,...
If you export the onnx graph using original https://github.com/open-mmlab/mmdetection it results in a complicated graph for the final detection portion after the convolution layers. However, we have represented all the...
The Quantization simulation required for QAT is done in Pytorch code. This may be the reason for slowness. It will be faster if it is done in the underlying C++...
I believe there is some mistake. But this will not change the speed. QAT training will be slower than regular training. For QAT training you need to give the model...