aimet icon indicating copy to clipboard operation
aimet copied to clipboard

tf.matmul operator does not support

Open tensor1to5 opened this issue 1 year ago • 20 comments

Using the AIMET scheme to quantify the model, but encountering a problem: the tf.matmul operator does not support it

tensor1to5 avatar Aug 07 '23 07:08 tensor1to5

Hi, we use the following code

import tensorflow as tf
inputs = tf.keras.Input(shape=(16,32,3))
x1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs) 
x2 = tf.transpose(x1, perm=[0, 1, 3, 2])
outputs = tf.matmul(x1, x2)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
out = model(tf.zeros((1,16,32,3)))
from aimet_tensorflow.keras import quantsim
from aimet_tensorflow.keras.quantsim import QuantizationSimModel
from aimet_common.defs import QuantScheme
sim = QuantizationSimModel(model=model,
                           quant_scheme=QuantScheme.post_training_tf,
                           rounding_mode="nearest",
                           default_output_bw=8,
                           default_param_bw=8)

The tf.matmul operator in the model is not supported.

AssertionError: Mismatch between number of tensors (1) and number of input quantizers (2) for layer tf.linalg.matmul

tensor1to5 avatar Aug 07 '23 07:08 tensor1to5

@quic-hitameht can you please help look at this? Thanks

quic-mangal avatar Aug 07 '23 16:08 quic-mangal

@quic-hitameht can you please help look at this? Thanks

have you solve that?

xiexiaozheng avatar Aug 14 '23 13:08 xiexiaozheng

@tensor1to5, it looks like a bug in TF quantization. We will take a look into this. Thanks for reporting it.

quic-mangal avatar Aug 14 '23 18:08 quic-mangal

Closing as this was resolved in #2411

quic-ernst avatar Aug 17 '23 20:08 quic-ernst

@quic-ernst for the quick turn around in solving this issue 💯

quic-mangal avatar Aug 17 '23 20:08 quic-mangal

@quic-ernst Hi, Thanks for your help. I test the new revised code from github, when train the model, but encounter a new problem as follows: sim = QuantizationSimModel(...) sim.model.summary() Non-trainable params is not zero.

Total params: 2,100,500
Trainable params:1,100,000
Non-trainable params: 1,000,500
......
when minimizing the loss. If you're using 'model.compile()',did you forget to provide a 'loss' argument

tensor1to5 avatar Aug 18 '23 09:08 tensor1to5

YES,There are still problems,OP cannot be added to training during QAT

leixuehui avatar Aug 18 '23 09:08 leixuehui

@tensor1to5 Hi there, for aimet_tensorflow.keras with QAT training, we have to choose which model to compile depending on what QuantScheme is used. You can reference these jupyter notebook QAT and QAT with Range Learning.

In terms of the trainable/non-trainable params, we have various parameters for maintaining state which will cause the QuantizationSimModel to have > 0 non-trainable params.

quic-ernst avatar Aug 18 '23 17:08 quic-ernst

@quic-ernst Hi, Thanks for your help. I have tried multiple configurations about quant_scheme, but quantization-aware-training still exist problems.

tensor1to5 avatar Aug 23 '23 07:08 tensor1to5

Hi, we use the following code

import tensorflow as tf
inputs = tf.keras.Input(shape=(16,32,3))
x1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs) 
x2 = tf.transpose(x1, perm=[0, 1, 3, 2])
outputs = tf.matmul(x1, x2)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
out = model(tf.zeros((1,16,32,3)))
from aimet_tensorflow.keras import quantsim
from aimet_tensorflow.keras.quantsim import QuantizationSimModel
from aimet_common.defs import QuantScheme
sim = QuantizationSimModel(model=model,
                           quant_scheme=QuantScheme.post_training_tf,
                           rounding_mode="nearest",
                           default_output_bw=8,
                           default_param_bw=8)

The tf.matmul operator in the model is not supported.

AssertionError: Mismatch between number of tensors (1) and number of input quantizers (2) for layer tf.linalg.matmul

@tensor1to5 Hi there. Are you seeing the same issue or a different one? Is the above your implementation and error? Thanks!

quic-ernst avatar Aug 23 '23 17:08 quic-ernst

Hi, we use the following code

import tensorflow as tf
inputs = tf.keras.Input(shape=(16,32,3))
x1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs) 
x2 = tf.transpose(x1, perm=[0, 1, 3, 2])
outputs = tf.matmul(x1, x2)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
out = model(tf.zeros((1,16,32,3)))
from aimet_tensorflow.keras import quantsim
from aimet_tensorflow.keras.quantsim import QuantizationSimModel
from aimet_common.defs import QuantScheme
sim = QuantizationSimModel(model=model,
                           quant_scheme=QuantScheme.post_training_tf,
                           rounding_mode="nearest",
                           default_output_bw=8,
                           default_param_bw=8)

The tf.matmul operator in the model is not supported.

AssertionError: Mismatch between number of tensors (1) and number of input quantizers (2) for layer tf.linalg.matmul

I added the following code to qc_quantize_wrapper.py, and after that, the code started working properly. elif self._is_lambda_operator_layer and 'b' in kwargs and len(self.input_quantizers) == 2: inputs = self._quantize_activation(inputs, [self.input_quantizers[0]], True) kwargs['b'] = self._quantize_activation(kwargs['b'], [self.input_quantizers[1]], True)

xiexiaozheng avatar Aug 24 '23 03:08 xiexiaozheng

@xiexiaozheng I just tried the below code and it was able to run.

import tensorflow as tf
inputs = tf.keras.Input(shape=(16,32,3))
x1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs)
x2 = tf.transpose(x1, perm=[0, 1, 3, 2])
outputs = tf.matmul(x1, x2)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
out = model(tf.zeros((1,16,32,3)))
from aimet_tensorflow.keras import quantsim
from aimet_tensorflow.keras.quantsim import QuantizationSimModel
from aimet_common.defs import QuantScheme
sim = QuantizationSimModel(model=model,
                           quant_scheme=QuantScheme.post_training_tf,
                           rounding_mode="nearest",
                           default_output_bw=8,
                           default_param_bw=8)
random_input = tf.random.uniform((1, 16, 32, 3))
sim.compute_encodings(lambda m, _: m(random_input), None)

with tempfile.TemporaryDirectory() as temp_dir:
    sim.export(temp_dir, "test")

print("Done.")

Could you verify you have these lines in your qc_quantize_wrapper.py file? Thank you! https://github.com/quic/aimet/blob/9914aa0e0a8d3c8b4e5b8dcd625ce5349740cc08/TrainingExtensions/tensorflow/src/python/aimet_tensorflow/keras/quant_sim/qc_quantize_wrapper.py#L321-L330

quic-ernst avatar Aug 24 '23 04:08 quic-ernst

@xiexiaozheng I just tried the below code and it was able to run.

import tensorflow as tf
inputs = tf.keras.Input(shape=(16,32,3))
x1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs)
x2 = tf.transpose(x1, perm=[0, 1, 3, 2])
outputs = tf.matmul(x1, x2)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
out = model(tf.zeros((1,16,32,3)))
from aimet_tensorflow.keras import quantsim
from aimet_tensorflow.keras.quantsim import QuantizationSimModel
from aimet_common.defs import QuantScheme
sim = QuantizationSimModel(model=model,
                           quant_scheme=QuantScheme.post_training_tf,
                           rounding_mode="nearest",
                           default_output_bw=8,
                           default_param_bw=8)
random_input = tf.random.uniform((1, 16, 32, 3))
sim.compute_encodings(lambda m, _: m(random_input), None)

with tempfile.TemporaryDirectory() as temp_dir:
    sim.export(temp_dir, "test")

print("Done.")

Could you verify you have these lines in your qc_quantize_wrapper.py file? Thank you!

https://github.com/quic/aimet/blob/9914aa0e0a8d3c8b4e5b8dcd625ce5349740cc08/TrainingExtensions/tensorflow/src/python/aimet_tensorflow/keras/quant_sim/qc_quantize_wrapper.py#L321-L330

"It seems you also need to add these lines of code in common.py: lambda_operators = ['operators.add', 'math.multiply', 'math.truediv', 'math.subtract', 'linalg.matmul']. Once you add this code, your provided code should work. The code you provided works because your tf.matmul has two keras.tensor inputs. However, during the conversion of my model, the positional encoding constant tensor is being converted into a list. I am still investigating the cause of this error."

xiexiaozheng avatar Aug 24 '23 05:08 xiexiaozheng

input_layer = tf.keras.Input([16, 16, 256], batch_size=3)
query_encoding = tf.Variable(initial_value= tf.random_normal_initializer()(shape=(16, 256),dtype='float32'),trainable=True)
outputs = tf.matmul(query_encoding, input_layer, transpose_b=True)
mymodel = tf.keras.Model(inputs=input_layer, outputs=outputs)

When I create the model in this way, the QuantizationSimModel throws an error,swapping the input parameters of tf.matmul allows it to pass without issues. AssertionError: Mismatch between number of tensors (16) and number of input quantizers (1) for layer tf.linalg.matmul

xiexiaozheng avatar Aug 24 '23 07:08 xiexiaozheng

@xiexiaozheng So a few things. First, I'm not sure you want to have the query_encodings like that. You set it to trainable but Keras will consume that and convert it to a TFOpLambda layer and Lambda layers are supposed to be stateless meaning you won't be able to train that parameter. Lambda layers are not automatically added to the gradients for calculations like in TF 1.X.

That being said, the AssertionError mention occurs because an initial step is skipped over as we are not expecting a tf.ResourceVariable. However, I am able to have your model work with the code below. This changes the type to a tf.Tensor/tf.EagerTensor and is able to run. I'm not sure why the transpose_b=True would change this result. Please note that I have input_layer first to make the shapes work.

input_layer = tf.keras.Input([16, 16, 256], batch_size=3)
query_encoding = tf.Variable(initial_value= tf.random_normal_initializer()(shape=(16, 256),dtype='float32'),trainable=True)
outputs = tf.matmul(input_layer, tf.transpose(query_encoding))
model = tf.keras.Model(inputs=input_layer, outputs=outputs)

quic-ernst avatar Aug 24 '23 22:08 quic-ernst

@quic-ernst Hi, Thanks for your help. About: Lambda layers are not automatically added to the gradients for calculations like in TF 1.X.

Because our deployment platform is 8bit/16bit DSP core, every operator needs to participate in quantization-aware-training. Is there a solution to this problem.

tensor1to5 avatar Aug 25 '23 02:08 tensor1to5

@xiexiaozheng So a few things. First, I'm not sure you want to have the query_encodings like that. You set it to trainable but Keras will consume that and convert it to a TFOpLambda layer and Lambda layers are supposed to be stateless meaning you won't be able to train that parameter. Lambda layers are not automatically added to the gradients for calculations like in TF 1.X.

That being said, the AssertionError mention occurs because an initial step is skipped over as we are not expecting a tf.ResourceVariable. However, I am able to have your model work with the code below. This changes the type to a tf.Tensor/tf.EagerTensor and is able to run. I'm not sure why the transpose_b=True would change this result. Please note that I have input_layer first to make the shapes work.

input_layer = tf.keras.Input([16, 16, 256], batch_size=3)
query_encoding = tf.Variable(initial_value= tf.random_normal_initializer()(shape=(16, 256),dtype='float32'),trainable=True)
outputs = tf.matmul(input_layer, tf.transpose(query_encoding))
model = tf.keras.Model(inputs=input_layer, outputs=outputs)

@quic-ernst Thanks for your answer. The variable I have here is meant for encoding the input variables, and this variable needs to be trained and learned. While I can directly create a tf.Variable to achieve this target in subclasses mode, but as you mentioned, this approach becomes ineffective when constructing models using the functional API, do you know of any solutions regarding defining trainable weights as a layer?

xiexiaozheng avatar Aug 25 '23 06:08 xiexiaozheng

@xiexiaozheng So a few things. First, I'm not sure you want to have the query_encodings like that. You set it to trainable but Keras will consume that and convert it to a TFOpLambda layer and Lambda layers are supposed to be stateless meaning you won't be able to train that parameter. Lambda layers are not automatically added to the gradients for calculations like in TF 1.X. That being said, the AssertionError mention occurs because an initial step is skipped over as we are not expecting a tf.ResourceVariable. However, I am able to have your model work with the code below. This changes the type to a tf.Tensor/tf.EagerTensor and is able to run. I'm not sure why the transpose_b=True would change this result. Please note that I have input_layer first to make the shapes work.

input_layer = tf.keras.Input([16, 16, 256], batch_size=3)
query_encoding = tf.Variable(initial_value= tf.random_normal_initializer()(shape=(16, 256),dtype='float32'),trainable=True)
outputs = tf.matmul(input_layer, tf.transpose(query_encoding))
model = tf.keras.Model(inputs=input_layer, outputs=outputs)

@quic-ernst Thanks for your answer. The variable I have here is meant for encoding the input variables, and this variable needs to be trained and learned. While I can directly create a tf.Variable to achieve this target in subclasses mode, but as you mentioned, this approach becomes ineffective when constructing models using the functional API, do you know of any solutions regarding defining trainable weights as a layer?

@xiexiaozheng Sorry for the late reply, the typical way if there isn't a built in layer in Keras defined for your use case is to create a subclass layer. That being said, we don't support fully subclass layers like below because we don't have any insight into the layers interal layers (depth_conv, two_convs).

class ConvTimesThree(tf.keras.layers.Layer):
    def __init__(self, **kwargs):

        super(ConvTimesThree, self).__init__(**kwargs)
        self.depth_conv = tf.keras.layers.DepthwiseConv2D(depth_multiplier=1,
                                                          kernel_size=(3, 3),
                                                          activation='relu',
                                                          name='class_conv_depth')
        self.two_convs = TwoConvs() # Another defined subclass layer

    def call(self, x, **kwargs):
        return self.depth_conv(self.two_convs(x))

Specifically, subclass layers are hard to deal with because we can't put the quantizers in the correct place if there are internal layers. In the above example, we would just have one input and output quantizer when we need multiple for the internal layers. This leads to a bad simulation and therefore bad results.

However, for your use case, it sounds like you have no internal layers. I believe if you do create a subclass layer, then you should be able to use QAT normally. AIMET won't know your layer, but it will place the quantizer default which should work.

Typically, we handle the subclass layered models with the model_preparer. Currently, our model preparer will fail on this though because we aren't expecting this non-Keras defined interanal layers. I believe this needs to be updated - as you mentioned in this ticket #2425

quic-ernst avatar Sep 05 '23 20:09 quic-ernst

@xiexiaozheng So a few things. First, I'm not sure you want to have the query_encodings like that. You set it to trainable but Keras will consume that and convert it to a TFOpLambda layer and Lambda layers are supposed to be stateless meaning you won't be able to train that parameter. Lambda layers are not automatically added to the gradients for calculations like in TF 1.X. That being said, the AssertionError mention occurs because an initial step is skipped over as we are not expecting a tf.ResourceVariable. However, I am able to have your model work with the code below. This changes the type to a tf.Tensor/tf.EagerTensor and is able to run. I'm not sure why the transpose_b=True would change this result. Please note that I have input_layer first to make the shapes work.

input_layer = tf.keras.Input([16, 16, 256], batch_size=3)
query_encoding = tf.Variable(initial_value= tf.random_normal_initializer()(shape=(16, 256),dtype='float32'),trainable=True)
outputs = tf.matmul(input_layer, tf.transpose(query_encoding))
model = tf.keras.Model(inputs=input_layer, outputs=outputs)

@quic-ernst Thanks for your answer. The variable I have here is meant for encoding the input variables, and this variable needs to be trained and learned. While I can directly create a tf.Variable to achieve this target in subclasses mode, but as you mentioned, this approach becomes ineffective when constructing models using the functional API, do you know of any solutions regarding defining trainable weights as a layer?

@xiexiaozheng Sorry for the late reply, the typical way if there isn't a built in layer in Keras defined for your use case is to create a subclass layer. That being said, we don't support fully subclass layers like below because we don't have any insight into the layers interal layers (depth_conv, two_convs).

class ConvTimesThree(tf.keras.layers.Layer):
    def __init__(self, **kwargs):

        super(ConvTimesThree, self).__init__(**kwargs)
        self.depth_conv = tf.keras.layers.DepthwiseConv2D(depth_multiplier=1,
                                                          kernel_size=(3, 3),
                                                          activation='relu',
                                                          name='class_conv_depth')
        self.two_convs = TwoConvs() # Another defined subclass layer

    def call(self, x, **kwargs):
        return self.depth_conv(self.two_convs(x))

Specifically, subclass layers are hard to deal with because we can't put the quantizers in the correct place if there are internal layers. In the above example, we would just have one input and output quantizer when we need multiple for the internal layers. This leads to a bad simulation and therefore bad results.

However, for your use case, it sounds like you have no internal layers. I believe if you do create a subclass layer, then you should be able to use QAT normally. AIMET won't know your layer, but it will place the quantizer default which should work.

Typically, we handle the subclass layered models with the model_preparer. Currently, our model preparer will fail on this though because we aren't expecting this non-Keras defined interanal layers. I believe this needs to be updated - as you mentioned in this ticket #2425

@quic-ernst Thank you very much for your response.

xiexiaozheng avatar Sep 06 '23 05:09 xiexiaozheng