aimet
aimet copied to clipboard
tf.matmul operator does not support
Using the AIMET scheme to quantify the model, but encountering a problem: the tf.matmul operator does not support it
Hi, we use the following code
import tensorflow as tf
inputs = tf.keras.Input(shape=(16,32,3))
x1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs)
x2 = tf.transpose(x1, perm=[0, 1, 3, 2])
outputs = tf.matmul(x1, x2)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
out = model(tf.zeros((1,16,32,3)))
from aimet_tensorflow.keras import quantsim
from aimet_tensorflow.keras.quantsim import QuantizationSimModel
from aimet_common.defs import QuantScheme
sim = QuantizationSimModel(model=model,
quant_scheme=QuantScheme.post_training_tf,
rounding_mode="nearest",
default_output_bw=8,
default_param_bw=8)
The tf.matmul operator in the model is not supported.
AssertionError: Mismatch between number of tensors (1) and number of input quantizers (2) for layer tf.linalg.matmul
@quic-hitameht can you please help look at this? Thanks
@quic-hitameht can you please help look at this? Thanks
have you solve that?
@tensor1to5, it looks like a bug in TF quantization. We will take a look into this. Thanks for reporting it.
Closing as this was resolved in #2411
@quic-ernst for the quick turn around in solving this issue 💯
@quic-ernst Hi, Thanks for your help. I test the new revised code from github, when train the model, but encounter a new problem as follows: sim = QuantizationSimModel(...) sim.model.summary() Non-trainable params is not zero.
Total params: 2,100,500
Trainable params:1,100,000
Non-trainable params: 1,000,500
......
when minimizing the loss. If you're using 'model.compile()',did you forget to provide a 'loss' argument
YES,There are still problems,OP cannot be added to training during QAT
@tensor1to5 Hi there, for aimet_tensorflow.keras
with QAT training, we have to choose which model to compile depending on what QuantScheme is used. You can reference these jupyter notebook QAT and QAT with Range Learning.
In terms of the trainable/non-trainable params, we have various parameters for maintaining state which will cause the QuantizationSimModel to have > 0 non-trainable params.
@quic-ernst Hi, Thanks for your help. I have tried multiple configurations about quant_scheme, but quantization-aware-training still exist problems.
Hi, we use the following code
import tensorflow as tf inputs = tf.keras.Input(shape=(16,32,3)) x1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs) x2 = tf.transpose(x1, perm=[0, 1, 3, 2]) outputs = tf.matmul(x1, x2) model = tf.keras.Model(inputs=inputs, outputs=outputs) out = model(tf.zeros((1,16,32,3))) from aimet_tensorflow.keras import quantsim from aimet_tensorflow.keras.quantsim import QuantizationSimModel from aimet_common.defs import QuantScheme sim = QuantizationSimModel(model=model, quant_scheme=QuantScheme.post_training_tf, rounding_mode="nearest", default_output_bw=8, default_param_bw=8)
The tf.matmul operator in the model is not supported.
AssertionError: Mismatch between number of tensors (1) and number of input quantizers (2) for layer tf.linalg.matmul
@tensor1to5 Hi there. Are you seeing the same issue or a different one? Is the above your implementation and error? Thanks!
Hi, we use the following code
import tensorflow as tf inputs = tf.keras.Input(shape=(16,32,3)) x1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs) x2 = tf.transpose(x1, perm=[0, 1, 3, 2]) outputs = tf.matmul(x1, x2) model = tf.keras.Model(inputs=inputs, outputs=outputs) out = model(tf.zeros((1,16,32,3))) from aimet_tensorflow.keras import quantsim from aimet_tensorflow.keras.quantsim import QuantizationSimModel from aimet_common.defs import QuantScheme sim = QuantizationSimModel(model=model, quant_scheme=QuantScheme.post_training_tf, rounding_mode="nearest", default_output_bw=8, default_param_bw=8)
The tf.matmul operator in the model is not supported.
AssertionError: Mismatch between number of tensors (1) and number of input quantizers (2) for layer tf.linalg.matmul
I added the following code to qc_quantize_wrapper.py, and after that, the code started working properly. elif self._is_lambda_operator_layer and 'b' in kwargs and len(self.input_quantizers) == 2: inputs = self._quantize_activation(inputs, [self.input_quantizers[0]], True) kwargs['b'] = self._quantize_activation(kwargs['b'], [self.input_quantizers[1]], True)
@xiexiaozheng I just tried the below code and it was able to run.
import tensorflow as tf
inputs = tf.keras.Input(shape=(16,32,3))
x1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs)
x2 = tf.transpose(x1, perm=[0, 1, 3, 2])
outputs = tf.matmul(x1, x2)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
out = model(tf.zeros((1,16,32,3)))
from aimet_tensorflow.keras import quantsim
from aimet_tensorflow.keras.quantsim import QuantizationSimModel
from aimet_common.defs import QuantScheme
sim = QuantizationSimModel(model=model,
quant_scheme=QuantScheme.post_training_tf,
rounding_mode="nearest",
default_output_bw=8,
default_param_bw=8)
random_input = tf.random.uniform((1, 16, 32, 3))
sim.compute_encodings(lambda m, _: m(random_input), None)
with tempfile.TemporaryDirectory() as temp_dir:
sim.export(temp_dir, "test")
print("Done.")
Could you verify you have these lines in your qc_quantize_wrapper.py
file? Thank you!
https://github.com/quic/aimet/blob/9914aa0e0a8d3c8b4e5b8dcd625ce5349740cc08/TrainingExtensions/tensorflow/src/python/aimet_tensorflow/keras/quant_sim/qc_quantize_wrapper.py#L321-L330
@xiexiaozheng I just tried the below code and it was able to run.
import tensorflow as tf inputs = tf.keras.Input(shape=(16,32,3)) x1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)(inputs) x2 = tf.transpose(x1, perm=[0, 1, 3, 2]) outputs = tf.matmul(x1, x2) model = tf.keras.Model(inputs=inputs, outputs=outputs) out = model(tf.zeros((1,16,32,3))) from aimet_tensorflow.keras import quantsim from aimet_tensorflow.keras.quantsim import QuantizationSimModel from aimet_common.defs import QuantScheme sim = QuantizationSimModel(model=model, quant_scheme=QuantScheme.post_training_tf, rounding_mode="nearest", default_output_bw=8, default_param_bw=8) random_input = tf.random.uniform((1, 16, 32, 3)) sim.compute_encodings(lambda m, _: m(random_input), None) with tempfile.TemporaryDirectory() as temp_dir: sim.export(temp_dir, "test") print("Done.")
Could you verify you have these lines in your
qc_quantize_wrapper.py
file? Thank you!https://github.com/quic/aimet/blob/9914aa0e0a8d3c8b4e5b8dcd625ce5349740cc08/TrainingExtensions/tensorflow/src/python/aimet_tensorflow/keras/quant_sim/qc_quantize_wrapper.py#L321-L330
"It seems you also need to add these lines of code in common.py: lambda_operators = ['operators.add', 'math.multiply', 'math.truediv', 'math.subtract', 'linalg.matmul']. Once you add this code, your provided code should work. The code you provided works because your tf.matmul has two keras.tensor inputs. However, during the conversion of my model, the positional encoding constant tensor is being converted into a list. I am still investigating the cause of this error."
input_layer = tf.keras.Input([16, 16, 256], batch_size=3)
query_encoding = tf.Variable(initial_value= tf.random_normal_initializer()(shape=(16, 256),dtype='float32'),trainable=True)
outputs = tf.matmul(query_encoding, input_layer, transpose_b=True)
mymodel = tf.keras.Model(inputs=input_layer, outputs=outputs)
When I create the model in this way, the QuantizationSimModel throws an error,swapping the input parameters of tf.matmul allows it to pass without issues. AssertionError: Mismatch between number of tensors (16) and number of input quantizers (1) for layer tf.linalg.matmul
@xiexiaozheng So a few things. First, I'm not sure you want to have the query_encodings
like that. You set it to trainable but Keras will consume that and convert it to a TFOpLambda layer and Lambda layers are supposed to be stateless meaning you won't be able to train that parameter. Lambda layers are not automatically added to the gradients for calculations like in TF 1.X.
That being said, the AssertionError mention occurs because an initial step is skipped over as we are not expecting a tf.ResourceVariable
. However, I am able to have your model work with the code below. This changes the type to a tf.Tensor/tf.EagerTensor
and is able to run. I'm not sure why the transpose_b=True
would change this result. Please note that I have input_layer
first to make the shapes work.
input_layer = tf.keras.Input([16, 16, 256], batch_size=3)
query_encoding = tf.Variable(initial_value= tf.random_normal_initializer()(shape=(16, 256),dtype='float32'),trainable=True)
outputs = tf.matmul(input_layer, tf.transpose(query_encoding))
model = tf.keras.Model(inputs=input_layer, outputs=outputs)
@quic-ernst Hi, Thanks for your help. About:
Lambda layers are not automatically added to the gradients for calculations like in TF 1.X.
Because our deployment platform is 8bit/16bit DSP core, every operator needs to participate in quantization-aware-training. Is there a solution to this problem.
@xiexiaozheng So a few things. First, I'm not sure you want to have the
query_encodings
like that. You set it to trainable but Keras will consume that and convert it to a TFOpLambda layer and Lambda layers are supposed to be stateless meaning you won't be able to train that parameter. Lambda layers are not automatically added to the gradients for calculations like in TF 1.X.That being said, the AssertionError mention occurs because an initial step is skipped over as we are not expecting a
tf.ResourceVariable
. However, I am able to have your model work with the code below. This changes the type to atf.Tensor/tf.EagerTensor
and is able to run. I'm not sure why thetranspose_b=True
would change this result. Please note that I haveinput_layer
first to make the shapes work.input_layer = tf.keras.Input([16, 16, 256], batch_size=3) query_encoding = tf.Variable(initial_value= tf.random_normal_initializer()(shape=(16, 256),dtype='float32'),trainable=True) outputs = tf.matmul(input_layer, tf.transpose(query_encoding)) model = tf.keras.Model(inputs=input_layer, outputs=outputs)
@quic-ernst Thanks for your answer. The variable I have here is meant for encoding the input variables, and this variable needs to be trained and learned. While I can directly create a tf.Variable to achieve this target in subclasses mode, but as you mentioned, this approach becomes ineffective when constructing models using the functional API, do you know of any solutions regarding defining trainable weights as a layer?
@xiexiaozheng So a few things. First, I'm not sure you want to have the
query_encodings
like that. You set it to trainable but Keras will consume that and convert it to a TFOpLambda layer and Lambda layers are supposed to be stateless meaning you won't be able to train that parameter. Lambda layers are not automatically added to the gradients for calculations like in TF 1.X. That being said, the AssertionError mention occurs because an initial step is skipped over as we are not expecting atf.ResourceVariable
. However, I am able to have your model work with the code below. This changes the type to atf.Tensor/tf.EagerTensor
and is able to run. I'm not sure why thetranspose_b=True
would change this result. Please note that I haveinput_layer
first to make the shapes work.input_layer = tf.keras.Input([16, 16, 256], batch_size=3) query_encoding = tf.Variable(initial_value= tf.random_normal_initializer()(shape=(16, 256),dtype='float32'),trainable=True) outputs = tf.matmul(input_layer, tf.transpose(query_encoding)) model = tf.keras.Model(inputs=input_layer, outputs=outputs)
@quic-ernst Thanks for your answer. The variable I have here is meant for encoding the input variables, and this variable needs to be trained and learned. While I can directly create a tf.Variable to achieve this target in subclasses mode, but as you mentioned, this approach becomes ineffective when constructing models using the functional API, do you know of any solutions regarding defining trainable weights as a layer?
@xiexiaozheng Sorry for the late reply, the typical way if there isn't a built in layer in Keras defined for your use case is to create a subclass layer. That being said, we don't support fully subclass layers like below because we don't have any insight into the layers interal layers (depth_conv
, two_convs
).
class ConvTimesThree(tf.keras.layers.Layer):
def __init__(self, **kwargs):
super(ConvTimesThree, self).__init__(**kwargs)
self.depth_conv = tf.keras.layers.DepthwiseConv2D(depth_multiplier=1,
kernel_size=(3, 3),
activation='relu',
name='class_conv_depth')
self.two_convs = TwoConvs() # Another defined subclass layer
def call(self, x, **kwargs):
return self.depth_conv(self.two_convs(x))
Specifically, subclass layers are hard to deal with because we can't put the quantizers in the correct place if there are internal layers. In the above example, we would just have one input and output quantizer when we need multiple for the internal layers. This leads to a bad simulation and therefore bad results.
However, for your use case, it sounds like you have no internal layers. I believe if you do create a subclass layer, then you should be able to use QAT normally. AIMET won't know your layer, but it will place the quantizer default which should work.
Typically, we handle the subclass layered models with the model_preparer
. Currently, our model preparer will fail on this though because we aren't expecting this non-Keras defined interanal layers. I believe this needs to be updated - as you mentioned in this ticket #2425
@xiexiaozheng So a few things. First, I'm not sure you want to have the
query_encodings
like that. You set it to trainable but Keras will consume that and convert it to a TFOpLambda layer and Lambda layers are supposed to be stateless meaning you won't be able to train that parameter. Lambda layers are not automatically added to the gradients for calculations like in TF 1.X. That being said, the AssertionError mention occurs because an initial step is skipped over as we are not expecting atf.ResourceVariable
. However, I am able to have your model work with the code below. This changes the type to atf.Tensor/tf.EagerTensor
and is able to run. I'm not sure why thetranspose_b=True
would change this result. Please note that I haveinput_layer
first to make the shapes work.input_layer = tf.keras.Input([16, 16, 256], batch_size=3) query_encoding = tf.Variable(initial_value= tf.random_normal_initializer()(shape=(16, 256),dtype='float32'),trainable=True) outputs = tf.matmul(input_layer, tf.transpose(query_encoding)) model = tf.keras.Model(inputs=input_layer, outputs=outputs)
@quic-ernst Thanks for your answer. The variable I have here is meant for encoding the input variables, and this variable needs to be trained and learned. While I can directly create a tf.Variable to achieve this target in subclasses mode, but as you mentioned, this approach becomes ineffective when constructing models using the functional API, do you know of any solutions regarding defining trainable weights as a layer?
@xiexiaozheng Sorry for the late reply, the typical way if there isn't a built in layer in Keras defined for your use case is to create a subclass layer. That being said, we don't support fully subclass layers like below because we don't have any insight into the layers interal layers (
depth_conv
,two_convs
).class ConvTimesThree(tf.keras.layers.Layer): def __init__(self, **kwargs): super(ConvTimesThree, self).__init__(**kwargs) self.depth_conv = tf.keras.layers.DepthwiseConv2D(depth_multiplier=1, kernel_size=(3, 3), activation='relu', name='class_conv_depth') self.two_convs = TwoConvs() # Another defined subclass layer def call(self, x, **kwargs): return self.depth_conv(self.two_convs(x))
Specifically, subclass layers are hard to deal with because we can't put the quantizers in the correct place if there are internal layers. In the above example, we would just have one input and output quantizer when we need multiple for the internal layers. This leads to a bad simulation and therefore bad results.
However, for your use case, it sounds like you have no internal layers. I believe if you do create a subclass layer, then you should be able to use QAT normally. AIMET won't know your layer, but it will place the quantizer default which should work.
Typically, we handle the subclass layered models with the
model_preparer
. Currently, our model preparer will fail on this though because we aren't expecting this non-Keras defined interanal layers. I believe this needs to be updated - as you mentioned in this ticket #2425
@quic-ernst Thank you very much for your response.