AutoAWQ icon indicating copy to clipboard operation
AutoAWQ copied to clipboard

How to Split AWQ Weights?

Open Azure-Tang opened this issue 1 year ago • 0 comments

Body: Hello,

I am currently working on implementing tensor parallelism and need some guidance on how to split AWQ weights properly. Here's the current state of the AWQ weights I'm working with:

print("Qweight Shape:", self.qweight.shape)  # torch.Size([3584, 4096])
print("Scales Shape:", self.scales.shape)    # torch.Size([32, 14336])
print("Scaled Zeros Shape:", self.scaled_zeros.shape)  # torch.Size([32, 14336])

To split the weights, I used the following approach:

qweight_left = self.qweight[:1792, :]
scales_left = self.scales[:, :7168]
scaled_zeros_left = self.scaled_zeros[:, :7168]

I also created a random input of shape (1, 2048, 4096) and performed a matrix multiplication with both the original and the split weights. However, the results do not match:

>>> torch.allclose(out_left, out[:,:,:7168])
False

Could someone advise on how to correctly split the AWQ weights to achieve effective tensor parallelism? Any help or suggestions would be greatly appreciated!

Thank you!

Azure-Tang avatar Sep 28 '24 07:09 Azure-Tang