Awesome-Deep-Neural-Network-Compression
Awesome-Deep-Neural-Network-Compression copied to clipboard
Gradient of TTQ
Hello, thanks for your re-implementation of Quantization methods. I have a question about the gradients of TTQ. The gradient of the scale coefficient is the mean of pos/neg weights gradient in your code.
But I found that the official tensorflow code TernaryNet set the gradient of the scale coefficient to the sum of pos/neg weights gradient. Is it a bug or is it for some other reasons?
Thanks a lot.
Hi @caiwenpu ,
Thanks for using my code !
But that is not a big problem right? It will not change the functionality of the method. Learning rate can do this averaging work.
Maybe you can try to use the sum. I didn't pay much attention to these details at that time.
Thanks for pointing out!
Best regards
Hi, sorry to bother you again. I found the grad of full-precision weights is set by the following code
grad_fp_weight = pos * grad_ternary_weight * pos_indices + \
grad_ternary_weight * pruned_indices + \
neg * grad_ternary_weight * neg_indices
In this case, because neg
is initialized to a random negative number, the grad of the neg weight is the opposite number of the grad in TernaryNet. I am not sure if I am right, could you check it?
Thanks a lot.
Hi @caiwenpu ,
Sorry for the late reply. neg_indices selects the gradients that correspond to weights set as negative. Thus the gradient of negative weights is not merely the opposite value of gradient in grad_ternary_weight.
The main logic is like: gradients of post-ternary weights is attained, the gradient of positive weights is to update pos, similar applies to neg. And the gradient of pre-ternary weights is updated according to chain rules.
I am not sure whether I get you.
Best regards, Shangyu
Hi, thanks for your careful reply. Maybe my expression is not very clear. More carefully, the gradient of negative pre-ternary weights are set to neg * grad_ternary_weight * neg_indices
in this code, and the variable neg
is a negative number.
In the official TTQ code TernaryNet, the gradient of negative pre-ternary weights is set to w_n * grad_ternary_weight * neg_indices
, but the variable w_n
is a positive number.
Therefore, the gradient of negative pre-ternary weights in this code is opposite to the gradient of negative pre-ternary weights in the official TTQ code TernaryNet. It's this problem that confuses me.
Thanks for your reply.