Awesome-Deep-Neural-Network-Compression icon indicating copy to clipboard operation
Awesome-Deep-Neural-Network-Compression copied to clipboard

Gradient of TTQ

Open caiwenpu opened this issue 5 years ago • 4 comments

Hello, thanks for your re-implementation of Quantization methods. I have a question about the gradients of TTQ. The gradient of the scale coefficient is the mean of pos/neg weights gradient in your code.

But I found that the official tensorflow code TernaryNet set the gradient of the scale coefficient to the sum of pos/neg weights gradient. Is it a bug or is it for some other reasons?

Thanks a lot.

caiwenpu avatar Sep 21 '19 01:09 caiwenpu

Hi @caiwenpu ,

Thanks for using my code ! But that is not a big problem right? It will not change the functionality of the method. Learning rate can do this averaging work.
Maybe you can try to use the sum. I didn't pay much attention to these details at that time. Thanks for pointing out!

Best regards

csyhhu avatar Sep 21 '19 02:09 csyhhu

Hi, sorry to bother you again. I found the grad of full-precision weights is set by the following code

        grad_fp_weight = pos * grad_ternary_weight * pos_indices + \
                         grad_ternary_weight * pruned_indices + \
                         neg * grad_ternary_weight * neg_indices

In this case, because neg is initialized to a random negative number, the grad of the neg weight is the opposite number of the grad in TernaryNet. I am not sure if I am right, could you check it?

Thanks a lot.

caiwenpu avatar Sep 21 '19 03:09 caiwenpu

Hi @caiwenpu ,

Sorry for the late reply. neg_indices selects the gradients that correspond to weights set as negative. Thus the gradient of negative weights is not merely the opposite value of gradient in grad_ternary_weight.

The main logic is like: gradients of post-ternary weights is attained, the gradient of positive weights is to update pos, similar applies to neg. And the gradient of pre-ternary weights is updated according to chain rules.

I am not sure whether I get you.

Best regards, Shangyu

csyhhu avatar Sep 23 '19 03:09 csyhhu

Hi, thanks for your careful reply. Maybe my expression is not very clear. More carefully, the gradient of negative pre-ternary weights are set to neg * grad_ternary_weight * neg_indices in this code, and the variable neg is a negative number.

In the official TTQ code TernaryNet, the gradient of negative pre-ternary weights is set to w_n * grad_ternary_weight * neg_indices, but the variable w_n is a positive number.

Therefore, the gradient of negative pre-ternary weights in this code is opposite to the gradient of negative pre-ternary weights in the official TTQ code TernaryNet. It's this problem that confuses me.

Thanks for your reply.

caiwenpu avatar Sep 24 '19 13:09 caiwenpu