W4A16 kernel error when group_size is not 128

Open ChenMnZ opened this issue 2 years ago • 1 comments

Hi,

Thanks for your interesting work and clear open-source code.

I have been trying to test the W4A16 kernel with different quantization group size， and I have found that this kernel only produces correct outputs when the group_size is set as 128.

For example, I tested the W4A16 kernel with the following code:

import torch
from awq.quantize.quantizer import pseudo_quantize_tensor,pseudo_quantize_model_weight
from awq.quantize.qmodule import WQLinear
w_bit = 4
q_group_size= 128
inputs = torch.randn((1,4096,4096)).cuda().half()
module = torch.nn.Linear(4096,4096, True).cuda().half()

module.weight.data, scales, zeros = pseudo_quantize_tensor(module.weight.data, n_bit=w_bit, get_scale_zp=True,q_group_size=q_group_size)
fake_outputs = module(inputs)
scales = scales.t().contiguous()
zeros = zeros.t().contiguous()
q_linear = WQLinear.from_linear(module, w_bit,q_group_size , False, scales, zeros)
real_outputs = q_linear(inputs)

print(f"average dist:{(real_outputs-fake_outputs).abs().mean()}")

when q_group_size=128, the gap is negligible:

average dist:0.00014293193817138672

However when q_group_size was set as other value, the gap becomes significant. Takeing group_size=256 as an example, the output is:

average dist:0.32958984375

Is there anything I can do to resolve this ?

Jun 30 '23 13:06 ChenMnZ

Hey @ChenMnZ,

You may want to try dev/more_models branch. Therein, the developers have added additional support for group size 64 as models such as tiiuae/falcon-7b-instruct cannot use group size 128.

Jul 05 '23 14:07 abhinavkulkarni