BitNet [BUG] bitlinear fix

beta and gamma sizes to be (1, weight.shape[0], not (weight.shape[0], 1) ???

Mar 10 '24 11:03 jayUyang

Can you elaborate please? Can you go deeper?

Mar 10 '24 17:03 kyegomez

I encountered the same problem. When passing a tensor of 4,2 int to a BitLinear(2,8), I get an error at the line return x * self.gamma * self.beta / self.Q_b Saying " Exception has occurred: RuntimeError The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 112, in dequantize_activations_groupwise return x * self.gamma * self.beta / self.Q_b File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 137, in forward output = self.dequantize_activations_groupwise(output) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 20, in forward x = self.layer1(x) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 39, in outputs = model(inputs) # Forward pass RuntimeError: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 " I think the shapes of the self.gamma and self.beta shapes are wrong. Gamma is initialized based on # output neuron shape but is set based on batch size

Mar 14 '24 09:03 Vipiao

I encountered the same problem. When passing a tensor of 4,2 int to a BitLinear(2,8), I get an error at the line return x * self.gamma * self.beta / self.Q_b Saying " Exception has occurred: RuntimeError The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 112, in dequantize_activations_groupwise return x * self.gamma * self.beta / self.Q_b File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 137, in forward output = self.dequantize_activations_groupwise(output) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 20, in forward x = self.layer1(x) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 39, in outputs = model(inputs) # Forward pass RuntimeError: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 " I think the shapes of the self.gamma and self.beta shapes are wrong. Gamma is initialized based on # output neuron shape but is set based on batch size

I think so, but I am confused that since self.gamma is related to activations while self.beta is related to weights, should we explicitly broadcast these two matrices [quantization about activations ('group_size = x.shape[0] // self.num_groups') should be grouped in the dim=1(x.shape[1]) because of the batch_size?], thus 'x * self.gamma * self.beta' in the dequantization process can do hadamard product? If I make wrong, pls point out. Thanks.

Mar 26 '24 07:03 zouyingcao

I encountered the same problem. When passing a tensor of 4,2 int to a BitLinear(2,8), I get an error at the line return x * self.gamma * self.beta / self.Q_b Saying " Exception has occurred: RuntimeError The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 112, in dequantize_activations_groupwise return x * self.gamma * self.beta / self.Q_b File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 137, in forward output = self.dequantize_activations_groupwise(output) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 20, in forward x = self.layer1(x) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 39, in outputs = model(inputs) # Forward pass RuntimeError: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 " I think the shapes of the self.gamma and self.beta shapes are wrong. Gamma is initialized based on # output neuron shape but is set based on batch size

I think so, but I am confused that since self.gamma is related to activations while self.beta is related to weights, should we explicitly broadcast these two matrices [quantization about activations ('group_size = x.shape[0] // self.num_groups') should be grouped in the dim=1(x.shape[1]) because of the batch_size?], thus 'x * self.gamma * self.beta' in the dequantization process can do hadamard product? If I make wrong, pls point out. Thanks.

emmm, I see the owner update the new code. (without group quantization)

Mar 26 '24 07:03 zouyingcao

Stale issue message

May 25 '24 12:05 github-actions[bot]

BitNet BitNet copied to clipboard

[BUG] bitlinear fix

BitNet
BitNet copied to clipboard