llm-awq icon indicating copy to clipboard operation
llm-awq copied to clipboard

Qustion about Auto Scale Process

Open rainyBJ opened this issue 2 years ago • 3 comments

  1. In auto_scale.py, I find that awq no longer consider weight_scale when finding proper scales, is it because by only considering act_scale, we can still get similar results as considering both of them?
  2. When finding the best scales, we ''use (org_out - out).float().pow(2).mean().item()'' as metric. But when we calculate the ''out'' result after rescaling, should we divide the input x by scales, to align the apply_scale process?

rainyBJ avatar Nov 03 '23 07:11 rainyBJ

Hi,

  1. We found that adding in the weight scale does not improve the performance, so removed it for simplicity.
  2. Yes, but we applied the division to the weight for the ease of implementation (see here https://github.com/mit-han-lab/llm-awq/blob/f0b4b68004f76d562658143cddea5aad8c1b8266/awq/quantize/auto_scale.py#L128)

Hope it addresses your questions!

tonylins avatar Nov 04 '23 04:11 tonylins

Thanks for your reply! Your answer completely solves my problem.

rainyBJ avatar Nov 05 '23 09:11 rainyBJ

Hello, I still have questions here. Why not use this Quantization first and then division?

fc.weight.data = w_quantize_func(fc.weight.data) / scales.view(1, -1)

Hi,

  1. We found that adding in the weight scale does not improve the performance, so removed it for simplicity.
  2. Yes, but we applied the division to the weight for the ease of implementation (see here https://github.com/mit-han-lab/llm-awq/blob/f0b4b68004f76d562658143cddea5aad8c1b8266/awq/quantize/auto_scale.py#L128 )

Hope it addresses your questions!

Hello, I still have questions here. Why not use this Quantization first and then division?

fc.weight.data = w_quantize_func(fc.weight.data) / scales.view(1, -1)

songh11 avatar Apr 17 '24 08:04 songh11