AutoAWQ
AutoAWQ copied to clipboard
How to Split AWQ Weights?
Body: Hello,
I am currently working on implementing tensor parallelism and need some guidance on how to split AWQ weights properly. Here's the current state of the AWQ weights I'm working with:
print("Qweight Shape:", self.qweight.shape) # torch.Size([3584, 4096])
print("Scales Shape:", self.scales.shape) # torch.Size([32, 14336])
print("Scaled Zeros Shape:", self.scaled_zeros.shape) # torch.Size([32, 14336])
To split the weights, I used the following approach:
qweight_left = self.qweight[:1792, :]
scales_left = self.scales[:, :7168]
scaled_zeros_left = self.scaled_zeros[:, :7168]
I also created a random input of shape (1, 2048, 4096) and performed a matrix multiplication with both the original and the split weights. However, the results do not match:
>>> torch.allclose(out_left, out[:,:,:7168])
False
Could someone advise on how to correctly split the AWQ weights to achieve effective tensor parallelism? Any help or suggestions would be greatly appreciated!
Thank you!