lsq-net
lsq-net copied to clipboard
Did you improve the performance using per-channel in weight quantization?
Hi, Great implementation! Since per-channel weight quantization is implemented in you code, I'm wondering if there is any improvement compared to per-tensor weight quantization.
I have tried it on ResNet/ImageNet, but I found the initial value selection of the hyperparameter s is very tricky.
I an not sure how I should modify the original expression (self.s = t.nn.Parameter(x.detach().abs().mean() * 2 / (self.thd_pos ** 0.5))
). I have tried some, but they cannot achieve an accuracy as high as the original one (i.e. without per-channel quan).
(And I don't have enough GPUs to do many experiments. It costs much of my spare time. :D
I have tried it on ResNet/ImageNet, but I found the initial value selection of the hyperparameter s is very tricky. I an not sure how I should modify the original expression (
self.s = t.nn.Parameter(x.detach().abs().mean() * 2 / (self.thd_pos ** 0.5))
). I have tried some, but they cannot achieve an accuracy as high as the original one (i.e. without per-channel quan).(And I don't have enough GPUs to do many experiments. It costs much of my spare time. :D
I used your implementation on Mobinetnet_v2@ImageNet and only quantize the conv weight to 4bit (fc weight and activation are float). It didn't work well, even I try per-channel and top1 score only gives about 0.68 (float is 71.88), any advice?
I have tried it on ResNet/ImageNet, but I found the initial value selection of the hyperparameter s is very tricky. I an not sure how I should modify the original expression (
self.s = t.nn.Parameter(x.detach().abs().mean() * 2 / (self.thd_pos ** 0.5))
). I have tried some, but they cannot achieve an accuracy as high as the original one (i.e. without per-channel quan). (And I don't have enough GPUs to do many experiments. It costs much of my spare time. :DI used your implementation on Mobinetnet_v2@ImageNet and only quantize the conv weight to 4bit (fc weight and activation are float). It didn't work well, even I try per-channel and top1 score only gives about 0.68 (float is 71.88), any advice?
You can try to modify: 1) the scaling factor of the gradients, and 2) the initialization value of s. And you can read another paper, LSQ+ (https://arxiv.org/abs/2004.09576), which analyzes the disadvantages of LSQ and provides some advices.
I have tried it on ResNet/ImageNet, but I found the initial value selection of the hyperparameter s is very tricky. I an not sure how I should modify the original expression (
self.s = t.nn.Parameter(x.detach().abs().mean() * 2 / (self.thd_pos ** 0.5))
). I have tried some, but they cannot achieve an accuracy as high as the original one (i.e. without per-channel quan). (And I don't have enough GPUs to do many experiments. It costs much of my spare time. :DI used your implementation on Mobinetnet_v2@ImageNet and only quantize the conv weight to 4bit (fc weight and activation are float). It didn't work well, even I try per-channel and top1 score only gives about 0.68 (float is 71.88), any advice?
You can try to modify: 1) the scaling factor of the gradients, and 2) the initialization value of s. And you can read another paper, LSQ+ (https://arxiv.org/abs/2004.09576), which analyzes the disadvantages of LSQ and provides some advices.
Thanks~