BitNet
BitNet copied to clipboard
[BUG] bitlinear fix
beta and gamma sizes to be (1, weight.shape[0], not (weight.shape[0], 1) ???
Can you elaborate please? Can you go deeper?
I encountered the same problem. When passing a tensor of 4,2 int to a BitLinear(2,8), I get an error at the line
return x * self.gamma * self.beta / self.Q_b
Saying
"
Exception has occurred: RuntimeError
The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0
File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 112, in dequantize_activations_groupwise
return x * self.gamma * self.beta / self.Q_b
File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 137, in forward
output = self.dequantize_activations_groupwise(output)
File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 20, in forward
x = self.layer1(x)
File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 39, in
I encountered the same problem. When passing a tensor of 4,2 int to a BitLinear(2,8), I get an error at the line return x * self.gamma * self.beta / self.Q_b Saying " Exception has occurred: RuntimeError The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 112, in dequantize_activations_groupwise return x * self.gamma * self.beta / self.Q_b File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 137, in forward output = self.dequantize_activations_groupwise(output) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 20, in forward x = self.layer1(x) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 39, in outputs = model(inputs) # Forward pass RuntimeError: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 " I think the shapes of the self.gamma and self.beta shapes are wrong. Gamma is initialized based on # output neuron shape but is set based on batch size
I think so, but I am confused that since self.gamma is related to activations while self.beta is related to weights, should we explicitly broadcast these two matrices [quantization about activations ('group_size = x.shape[0] // self.num_groups') should be grouped in the dim=1(x.shape[1]) because of the batch_size?], thus 'x * self.gamma * self.beta' in the dequantization process can do hadamard product? If I make wrong, pls point out. Thanks.
I encountered the same problem. When passing a tensor of 4,2 int to a BitLinear(2,8), I get an error at the line return x * self.gamma * self.beta / self.Q_b Saying " Exception has occurred: RuntimeError The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 112, in dequantize_activations_groupwise return x * self.gamma * self.beta / self.Q_b File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\bitlinear.py", line 137, in forward output = self.dequantize_activations_groupwise(output) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 20, in forward x = self.layer1(x) File "C:\Users\Markus\OneDrive\phd\NYCU\research\bit_net\xor_test_bitlinear.py", line 39, in outputs = model(inputs) # Forward pass RuntimeError: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0 " I think the shapes of the self.gamma and self.beta shapes are wrong. Gamma is initialized based on # output neuron shape but is set based on batch size
I think so, but I am confused that since self.gamma is related to activations while self.beta is related to weights, should we explicitly broadcast these two matrices [quantization about activations ('group_size = x.shape[0] // self.num_groups') should be grouped in the dim=1(x.shape[1]) because of the batch_size?], thus 'x * self.gamma * self.beta' in the dequantization process can do hadamard product? If I make wrong, pls point out. Thanks.
emmm, I see the owner update the new code. (without group quantization)
Stale issue message