lsq-net icon indicating copy to clipboard operation
lsq-net copied to clipboard

Did you improve the performance using per-channel in weight quantization?

Open talenz opened this issue 3 years ago • 4 comments

Hi, Great implementation! Since per-channel weight quantization is implemented in you code, I'm wondering if there is any improvement compared to per-tensor weight quantization.

talenz avatar Apr 14 '21 03:04 talenz

I have tried it on ResNet/ImageNet, but I found the initial value selection of the hyperparameter s is very tricky. I an not sure how I should modify the original expression (self.s = t.nn.Parameter(x.detach().abs().mean() * 2 / (self.thd_pos ** 0.5))). I have tried some, but they cannot achieve an accuracy as high as the original one (i.e. without per-channel quan).

(And I don't have enough GPUs to do many experiments. It costs much of my spare time. :D

zhutmost avatar Apr 14 '21 06:04 zhutmost

I have tried it on ResNet/ImageNet, but I found the initial value selection of the hyperparameter s is very tricky. I an not sure how I should modify the original expression (self.s = t.nn.Parameter(x.detach().abs().mean() * 2 / (self.thd_pos ** 0.5))). I have tried some, but they cannot achieve an accuracy as high as the original one (i.e. without per-channel quan).

(And I don't have enough GPUs to do many experiments. It costs much of my spare time. :D

I used your implementation on Mobinetnet_v2@ImageNet and only quantize the conv weight to 4bit (fc weight and activation are float). It didn't work well, even I try per-channel and top1 score only gives about 0.68 (float is 71.88), any advice?

talenz avatar Apr 14 '21 09:04 talenz

I have tried it on ResNet/ImageNet, but I found the initial value selection of the hyperparameter s is very tricky. I an not sure how I should modify the original expression (self.s = t.nn.Parameter(x.detach().abs().mean() * 2 / (self.thd_pos ** 0.5))). I have tried some, but they cannot achieve an accuracy as high as the original one (i.e. without per-channel quan). (And I don't have enough GPUs to do many experiments. It costs much of my spare time. :D

I used your implementation on Mobinetnet_v2@ImageNet and only quantize the conv weight to 4bit (fc weight and activation are float). It didn't work well, even I try per-channel and top1 score only gives about 0.68 (float is 71.88), any advice?

You can try to modify: 1) the scaling factor of the gradients, and 2) the initialization value of s. And you can read another paper, LSQ+ (https://arxiv.org/abs/2004.09576), which analyzes the disadvantages of LSQ and provides some advices.

zhutmost avatar Apr 14 '21 10:04 zhutmost

I have tried it on ResNet/ImageNet, but I found the initial value selection of the hyperparameter s is very tricky. I an not sure how I should modify the original expression (self.s = t.nn.Parameter(x.detach().abs().mean() * 2 / (self.thd_pos ** 0.5))). I have tried some, but they cannot achieve an accuracy as high as the original one (i.e. without per-channel quan). (And I don't have enough GPUs to do many experiments. It costs much of my spare time. :D

I used your implementation on Mobinetnet_v2@ImageNet and only quantize the conv weight to 4bit (fc weight and activation are float). It didn't work well, even I try per-channel and top1 score only gives about 0.68 (float is 71.88), any advice?

You can try to modify: 1) the scaling factor of the gradients, and 2) the initialization value of s. And you can read another paper, LSQ+ (https://arxiv.org/abs/2004.09576), which analyzes the disadvantages of LSQ and provides some advices.

Thanks~

talenz avatar Apr 21 '21 01:04 talenz