brevitas
brevitas copied to clipboard
Input scale required
Hi,
I am running the example in brevitas github page as follows.
I am experiencing the following issue:
Exception has occurred: RuntimeError
Input scale required
File "/home/brevitas/proxy/parameter_quant.py", line 196, in forward
raise RuntimeError("Input scale required")
File "/home/brevitas/nn/quant_layer.py", line 357, in forward_impl
quant_bias = self.bias_quant(self.bias, output_scale, output_bit_width)
File "/home/brevitas/nn/quant_conv.py", line 225, in forward
return self.forward_impl(input)
File "/home/brevitas_example.py", line 31, in forward
out = self.relu1(self.conv1(x))
File "/home/brevitas_example.py", line 50, in <module>
ret = lpln(timage)
what is 'Input scale required'? how to scale the input? is there any example? one more question is: what is the request of input for the Layers in Brevitas? for example, qnn.QuanLinear, should I input float or int or uint8 to this layer?
class LowPrecisionLeNet(Module):
def __init__(self):
super(LowPrecisionLeNet, self).__init__()
self.quant_inp = qnn.QuantIdentity(bit_width=4, return_quant_tensor=True)
self.conv1 = qnn.QuantConv2d(3, 6, 5, weight_bit_width=3, bias_quant=BiasQuant, return_quant_tensor=True)
self.relu1 = qnn.QuantReLU(bit_width=4, return_quant_tensor=True)
self.conv2 = qnn.QuantConv2d(6, 16, 5, weight_bit_width=3, bias_quant=BiasQuant, return_quant_tensor=True)
self.relu2 = qnn.QuantReLU(bit_width=4, return_quant_tensor=True)
self.fc1 = qnn.QuantLinear(16*5*5, 120, bias=True, weight_bit_width=3, bias_quant=BiasQuant, return_quant_tensor=True)
self.relu3 = qnn.QuantReLU(bit_width=4, return_quant_tensor=True)
self.fc2 = qnn.QuantLinear(120, 84, bias=True, weight_bit_width=3, bias_quant=BiasQuant, return_quant_tensor=True)
self.relu4 = qnn.QuantReLU(bit_width=4, return_quant_tensor=True)
self.fc3 = qnn.QuantLinear(84, 10, bias=False, weight_bit_width=3)
def forward(self, x):
# out = self.quant_inp(x)
out = self.relu1(self.conv1(x))
out = F.max_pool2d(out, 2)
out = self.relu2(self.conv2(out))
out = F.max_pool2d(out, 2)
out = out.reshape(out.shape[0], -1)
out = self.relu3(self.fc1(out))
out = self.relu4(self.fc2(out))
out = self.fc3(out)
return out
if __name__ == '__main__':
lpln = LowPrecisionLeNet()
file = r"/home/image_1.jpg"
# image = cv2.imread(file, -1)/255.0
image = Image.open(file)
image = np.array(image)/2550.0
timage = torch.unsqueeze(torch.from_numpy(image).type(torch.float), 0)
ret = lpln(timage)
Hi,
This depends on which bias quantizer you use, most bias quantizers require the input to be quantized e.g. Int8Bias, Int16Bias, Int32Bias
if you do not want to quantize the input you can use a quantizer like Int8BiasPerTensorFixedPointInternalScaling
which does not require the input to be quantized.
@MohamedA95 Thank you!
I have tried my best to understand Brevitas code, but it is really hard to be understood.
Is there any document about Brevitas? especially, it should be very useful if there is some document about the class structure of Brevitas
Hi, I am not sure what is the current state of the documentation as I am not part of the team behind brevitas. But there are a couple of notebooks here that are helpful. Also, the Gitter channel has many helpful answers. Furthermore, the ARCHITECTURE.md file is helpful. Overall I would advise you not to put too much effort into trying to understand where each parameter goes and who is it processed (unless this is your goal) as inheritance is heavily used in Brevitas, just trust that it works.
@MohamedA95 Thank you!
To be honest, my goal is to port the quantization code to C code and make it run in embedded MCU, so to understand how the code works is very important for me.
I will check the document you pointed out. Thank you again!
@MohamedA95 Many Thanks!
in fact, I previously read some quantization paper like: https://arxiv.org/pdf/2004.09602.pdf.
Currently, the question/problem for me is that I want to use brevitas and want to know how it works.
Please check the following test code.
Currently, my questions are:
- for QuantConv2d layer, I want to the input, weight, bias, and output to be quantized, how to do that?
- as
weight_quant: Optional[WeightQuantType] = Int8WeightPerTensorFloat
, to my understanding, the weight should be quantized to 8 bits data. however, I found out that inwq
, the weights are still float? - I would like to check the code how the float is quantized to int8, could you please tell the exact function/file where the quantization is done?
default_quant_conv = QuantConv2d(in_channels=2, out_channels=3, kernel_size=(3, 3), bias=False)
torch.nn.init.constant_(default_quant_conv.weight, 0.1)
# input = torch.randn(1, 2, 5, 5)
input = [[[1,2,3,4,5], [11,22,33,44,55], [12,13,14,15,16], [22,23,24,25,26], [33,34,35,36,37]],
[[11,12,13,14,15], [21,22,31,41,51], [31,32,33,34,36], [42,43,44,45,46], [51,52,53,54,56]]]
# input = np.ones((2,5,5)).tolist()
input = torch.unsqueeze(torch.tensor(input,dtype=torch.float32), 0)/100.0
out = default_quant_conv(input)
wq= default_quant_conv.quant_weight()
Hi, AFAIK, brevitas follows the same quantization scheme as the paper I shared previously. Regarding your questions:
- In each layer you can define,
weight_quant, bias_quant, input_quant, output_quant
. Some layers have default quantizers for example the conv layer usesInt8WeightPerTensorFloat
as the default weight quantizer, check here. - Yes, as far as I know, there is no fixed point data type in python. Brevitas saves the parameters in float and exposes 3 functions to access the weights
.weight
original weights.quant_weight
these are the quantized weights in an unquantized format that is float,.int_weight
those are the quantized weights that you are looking for. The way you interpret them depends on the type of quantization either integer or fixed point. - I do not know.
Hi @MohamedA95 Thank you!
for the above question 2), I get a further question:
as you said, we could access .weight
, .quant_weight
and .int_weight
to get different format of weight. the question is in QuantConv2D layer, what type of data is used for calculation? does it depends on what we set for weight_quant
?
for example, if we set weight_quant
to Int8WeightPerTensorFloat
, the int8
format of weight
is used for calculation in the layer, is my understanding correct?
I am not sure what type is actually used but as far as I understand it can be float. You do not have to use int8 to get the values in the range of int8. In other words, you can do the quantization using the following equation Real_value=Scale*Quant_Value-Zero_point in float and then clip the values and cast the result to int. Also, AFAIK python does not support variable bit_width data type.