W4A16 kernel error when group_size is not 128
Hi,
Thanks for your interesting work and clear open-source code.
I have been trying to test the W4A16 kernel with different quantization group size, and I have found that this kernel only produces correct outputs when the group_size is set as 128.
For example, I tested the W4A16 kernel with the following code:
import torch
from awq.quantize.quantizer import pseudo_quantize_tensor,pseudo_quantize_model_weight
from awq.quantize.qmodule import WQLinear
w_bit = 4
q_group_size= 128
inputs = torch.randn((1,4096,4096)).cuda().half()
module = torch.nn.Linear(4096,4096, True).cuda().half()
module.weight.data, scales, zeros = pseudo_quantize_tensor(module.weight.data, n_bit=w_bit, get_scale_zp=True,q_group_size=q_group_size)
fake_outputs = module(inputs)
scales = scales.t().contiguous()
zeros = zeros.t().contiguous()
q_linear = WQLinear.from_linear(module, w_bit,q_group_size , False, scales, zeros)
real_outputs = q_linear(inputs)
print(f"average dist:{(real_outputs-fake_outputs).abs().mean()}")
when q_group_size=128, the gap is negligible:
average dist:0.00014293193817138672
However when q_group_size was set as other value, the gap becomes significant. Takeing group_size=256 as an example, the output is:
average dist:0.32958984375
Is there anything I can do to resolve this ?
Hey @ChenMnZ,
You may want to try dev/more_models branch. Therein, the developers have added additional support for group size 64 as models such as tiiuae/falcon-7b-instruct cannot use group size 128.