brevitas icon indicating copy to clipboard operation
brevitas copied to clipboard

Input scale required

Open ardeal opened this issue 1 year ago • 9 comments

Hi,

I am running the example in brevitas github page as follows.

I am experiencing the following issue:

Exception has occurred: RuntimeError
Input scale required
  File "/home/brevitas/proxy/parameter_quant.py", line 196, in forward
    raise RuntimeError("Input scale required")
  File "/home/brevitas/nn/quant_layer.py", line 357, in forward_impl
    quant_bias = self.bias_quant(self.bias, output_scale, output_bit_width)
  File "/home/brevitas/nn/quant_conv.py", line 225, in forward
    return self.forward_impl(input)
  File "/home/brevitas_example.py", line 31, in forward
    out = self.relu1(self.conv1(x))
  File "/home/brevitas_example.py", line 50, in <module>
    ret = lpln(timage)

what is 'Input scale required'? how to scale the input? is there any example? one more question is: what is the request of input for the Layers in Brevitas? for example, qnn.QuanLinear, should I input float or int or uint8 to this layer?

class LowPrecisionLeNet(Module):
    def __init__(self):
        super(LowPrecisionLeNet, self).__init__()
        self.quant_inp = qnn.QuantIdentity(bit_width=4, return_quant_tensor=True)
        self.conv1 = qnn.QuantConv2d(3, 6, 5, weight_bit_width=3, bias_quant=BiasQuant, return_quant_tensor=True)
        self.relu1 = qnn.QuantReLU(bit_width=4, return_quant_tensor=True)
        self.conv2 = qnn.QuantConv2d(6, 16, 5, weight_bit_width=3, bias_quant=BiasQuant, return_quant_tensor=True)
        self.relu2 = qnn.QuantReLU(bit_width=4, return_quant_tensor=True)
        self.fc1   = qnn.QuantLinear(16*5*5, 120, bias=True, weight_bit_width=3, bias_quant=BiasQuant, return_quant_tensor=True)
        self.relu3 = qnn.QuantReLU(bit_width=4, return_quant_tensor=True)
        self.fc2   = qnn.QuantLinear(120, 84, bias=True, weight_bit_width=3, bias_quant=BiasQuant, return_quant_tensor=True)
        self.relu4 = qnn.QuantReLU(bit_width=4, return_quant_tensor=True)
        self.fc3   = qnn.QuantLinear(84, 10, bias=False, weight_bit_width=3)

    def forward(self, x):
        # out = self.quant_inp(x)
        out = self.relu1(self.conv1(x))
        out = F.max_pool2d(out, 2)
        out = self.relu2(self.conv2(out))
        out = F.max_pool2d(out, 2)
        out = out.reshape(out.shape[0], -1)
        out = self.relu3(self.fc1(out))
        out = self.relu4(self.fc2(out))
        out = self.fc3(out)
        return out

if __name__ == '__main__':
    lpln = LowPrecisionLeNet()
    file = r"/home/image_1.jpg"
    # image = cv2.imread(file, -1)/255.0
    image = Image.open(file)
    image = np.array(image)/2550.0
    timage = torch.unsqueeze(torch.from_numpy(image).type(torch.float), 0)
    ret = lpln(timage)

ardeal avatar Sep 23 '22 07:09 ardeal

Hi, This depends on which bias quantizer you use, most bias quantizers require the input to be quantized e.g. Int8Bias, Int16Bias, Int32Bias if you do not want to quantize the input you can use a quantizer like Int8BiasPerTensorFixedPointInternalScaling which does not require the input to be quantized.

MohamedA95 avatar Sep 23 '22 09:09 MohamedA95

@MohamedA95 Thank you!

I have tried my best to understand Brevitas code, but it is really hard to be understood.

Is there any document about Brevitas? especially, it should be very useful if there is some document about the class structure of Brevitas

ardeal avatar Sep 23 '22 10:09 ardeal

Hi, I am not sure what is the current state of the documentation as I am not part of the team behind brevitas. But there are a couple of notebooks here that are helpful. Also, the Gitter channel has many helpful answers. Furthermore, the ARCHITECTURE.md file is helpful. Overall I would advise you not to put too much effort into trying to understand where each parameter goes and who is it processed (unless this is your goal) as inheritance is heavily used in Brevitas, just trust that it works.

MohamedA95 avatar Sep 23 '22 14:09 MohamedA95

@MohamedA95 Thank you!

To be honest, my goal is to port the quantization code to C code and make it run in embedded MCU, so to understand how the code works is very important for me.

I will check the document you pointed out. Thank you again!

ardeal avatar Sep 24 '22 10:09 ardeal

Hi @ardeal, In this case, have a look at this paper and this doc.

MohamedA95 avatar Sep 26 '22 12:09 MohamedA95

@MohamedA95 Many Thanks!

in fact, I previously read some quantization paper like: https://arxiv.org/pdf/2004.09602.pdf.

Currently, the question/problem for me is that I want to use brevitas and want to know how it works.

Please check the following test code.

Currently, my questions are:

  1. for QuantConv2d layer, I want to the input, weight, bias, and output to be quantized, how to do that?
  2. as weight_quant: Optional[WeightQuantType] = Int8WeightPerTensorFloat, to my understanding, the weight should be quantized to 8 bits data. however, I found out that in wq, the weights are still float?
  3. I would like to check the code how the float is quantized to int8, could you please tell the exact function/file where the quantization is done?
   default_quant_conv = QuantConv2d(in_channels=2, out_channels=3, kernel_size=(3, 3), bias=False)
    torch.nn.init.constant_(default_quant_conv.weight, 0.1)

    # input = torch.randn(1, 2, 5, 5)
    input = [[[1,2,3,4,5], [11,22,33,44,55], [12,13,14,15,16], [22,23,24,25,26], [33,34,35,36,37]],
    [[11,12,13,14,15], [21,22,31,41,51], [31,32,33,34,36], [42,43,44,45,46], [51,52,53,54,56]]]

    # input = np.ones((2,5,5)).tolist()
    input = torch.unsqueeze(torch.tensor(input,dtype=torch.float32), 0)/100.0
    out = default_quant_conv(input)

    wq= default_quant_conv.quant_weight()

ardeal avatar Sep 27 '22 09:09 ardeal

Hi, AFAIK, brevitas follows the same quantization scheme as the paper I shared previously. Regarding your questions:

  1. In each layer you can define, weight_quant, bias_quant, input_quant, output_quant. Some layers have default quantizers for example the conv layer uses Int8WeightPerTensorFloat as the default weight quantizer, check here.
  2. Yes, as far as I know, there is no fixed point data type in python. Brevitas saves the parameters in float and exposes 3 functions to access the weights .weight original weights .quant_weight these are the quantized weights in an unquantized format that is float, .int_weight those are the quantized weights that you are looking for. The way you interpret them depends on the type of quantization either integer or fixed point.
  3. I do not know.

MohamedA95 avatar Sep 28 '22 19:09 MohamedA95

Hi @MohamedA95 Thank you!

for the above question 2), I get a further question: as you said, we could access .weight, .quant_weight and .int_weight to get different format of weight. the question is in QuantConv2D layer, what type of data is used for calculation? does it depends on what we set for weight_quant?

for example, if we set weight_quant to Int8WeightPerTensorFloat, the int8 format of weight is used for calculation in the layer, is my understanding correct?

ardeal avatar Sep 28 '22 23:09 ardeal

I am not sure what type is actually used but as far as I understand it can be float. You do not have to use int8 to get the values in the range of int8. In other words, you can do the quantization using the following equation Real_value=Scale*Quant_Value-Zero_point in float and then clip the values and cast the result to int. Also, AFAIK python does not support variable bit_width data type.

MohamedA95 avatar Sep 29 '22 11:09 MohamedA95