llm-awq
llm-awq copied to clipboard
Qustion about Auto Scale Process
- In auto_scale.py, I find that awq no longer consider weight_scale when finding proper scales, is it because by only considering act_scale, we can still get similar results as considering both of them?
- When finding the best scales, we ''use (org_out - out).float().pow(2).mean().item()'' as metric. But when we calculate the ''out'' result after rescaling, should we divide the input x by scales, to align the apply_scale process?
Hi,
- We found that adding in the weight scale does not improve the performance, so removed it for simplicity.
- Yes, but we applied the division to the weight for the ease of implementation (see here https://github.com/mit-han-lab/llm-awq/blob/f0b4b68004f76d562658143cddea5aad8c1b8266/awq/quantize/auto_scale.py#L128)
Hope it addresses your questions!
Thanks for your reply! Your answer completely solves my problem.
Hello, I still have questions here. Why not use this Quantization first and then division?
fc.weight.data = w_quantize_func(fc.weight.data) / scales.view(1, -1)
Hi,
- We found that adding in the weight scale does not improve the performance, so removed it for simplicity.
- Yes, but we applied the division to the weight for the ease of implementation (see here https://github.com/mit-han-lab/llm-awq/blob/f0b4b68004f76d562658143cddea5aad8c1b8266/awq/quantize/auto_scale.py#L128 )
Hope it addresses your questions!
Hello, I still have questions here. Why not use this Quantization first and then division?
fc.weight.data = w_quantize_func(fc.weight.data) / scales.view(1, -1)