model-optimization icon indicating copy to clipboard operation
model-optimization copied to clipboard

QAT (quantization aware training) Support quantizing models recursively

Open CRosero opened this issue 4 years ago • 24 comments

Describe the bug I'm doing transfer learning and would like to (at the end) quantize my model. The problem is that when I try to use the quantize_model() function (which is used successfully in numerous tutorials and videos), I get an error. How am I supposed to do quantization for transfer learning (using an already previously built model as a feature extractor)?

System information

TensorFlow installed from (source or binary): pip

TensorFlow version: tf-nightly 2.2.0

TensorFlow Model Optimization version: 0.3.0

Python version: 3.7.7

Describe the expected behavior I expect the model to be successfully quantized and for no error messages to appear.

Describe the current behavior I get the error: "ValueError: Quantizing a tf.keras Model inside another tf.keras Model is not supported."

Code to reproduce the issue Can be found here

CRosero avatar May 04 '20 16:05 CRosero

My way to workaround this is to quantize both models separately and then combine them into normal Keras model.

q_base_model = quantize_model(base_model)
q_head_model = quantize_model(head_model)
inputs = Input(...)
h = q_base_model(inputs)
outputs = q_head_model(h)
full_model = Model(inputs, outputs)
full_model.compile(...)
full_model.fit(...)

I'm not sure if this is a correct approach but it works for me.

kmkolasinski avatar May 04 '20 17:05 kmkolasinski

@alanchiao @nutsiepully Could you take a look? Thanks!

miaout17 avatar May 05 '20 22:05 miaout17

Hi @CRosero,

We haven't added support for quantizing Keras models within models yet. This is possible, and something we intend to do in the future.

In the meanwhile, @kmkolasinski is right. That's the approach you would have to use when using models recursively. Just quantize all the models you are interested in.

Thanks @kmkolasinski!

nutsiepully avatar May 07 '20 07:05 nutsiepully

Thanks! @nutsiepully @kmkolasinski

Quantizing models recursively and combining models cannot make fully quantized model?

base_model = keras.Sequential([ keras.layers.InputLayer(input_shape=(28, 28)), keras.layers.Reshape(target_shape=(28, 28, 1)), keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation=tf.nn.relu), keras.layers.MaxPooling2D(pool_size=(2, 2)), keras.layers.Flatten(), ])

head_model = keras.Sequential([ keras.layers.InputLayer(input_shape=(None, 2028)), keras.layers.Dense(10, activation=tf.nn.softmax) ])

quantize_model = tfmot.quantization.keras.quantize_model

q_base_model = quantize_model(base_model) q_head_model = quantize_model(head_model)

q_full_model = keras.Sequential([ q_base_model, q_head_model ])

q_full_model.compile... q_full_model.fit...

converter = tf.lite.TFLiteConverter.from_keras_model(q_full_model) converter.optimizations = [tf.lite.Optimize.DEFAULT] quantized_tflite_model = converter.convert()

When I tried to convert it, I got the error message :

ValueError("Unsupported tf.dtype {0}".format(tf_dtype))

q_full_model is not fully quantized?

Kyle719 avatar May 14 '20 04:05 Kyle719

Hi @Kyle719,

I tried reproducing this, but I didn't see any errors. It converted just fine.

Please make sure you use tf-nightly. This should explain how the conversion is done.

nutsiepully avatar Jun 03 '20 06:06 nutsiepully

Hey @nutsiepully thanks for the insight. Would you mind keeping this issue up to date with any changes in status/priority/roadmaps, etc.. regarding this capability moving forward? Thanks!

willbattel avatar Jun 06 '20 23:06 willbattel

Will update it, once we add support for it.

nutsiepully avatar Jun 08 '20 22:06 nutsiepully

@kmkolasinski Thanks for your suggestion. I am trying it out but unfortunately not getting it to work. My code looks similar to that of @Kyle719, but I'm already getting a ValueError on q_head_model = quantize_model(head_model), saying

model must contain at least one layer which have been annotated with quantize_annotate*. There are no layers to quantize.

If you go to versions saved, this is labeled as "Initial attempt" Even after adding that which is suggested in the error (version "quantize_annotate change"), it doesn't go away and still stays there.

@nutsiepully and the others, do you happen to have any suggestions for a solution? (FYI I made the link so you can try the code and corresponding solutions out on the colab directly, hope that makes it easier)

CRosero avatar Jun 17 '20 09:06 CRosero

@CRosero - I fixed the code in your colab. Your Sequential model has not been constructed correctly - it was missing parentheses. It does not actually have any layers. That's why it was failing.

Also, after quantize_annotate..., you just have to use quantize_apply not quantize_model again. Though it still works here.

I understand the complexity of using a new API, but it's generally not feasible for me to debug user code.

nutsiepully avatar Jun 23 '20 02:06 nutsiepully

Thank you very much @nutsiepully for your patience help! Didn't notice that at all...made the corresponding changes and now it's working :)

CRosero avatar Jun 23 '20 06:06 CRosero

Thanks! @nutsiepully 'Transfer learning + QAT' is working well like the code below (I used VGG19 because it does not have the batch normalization layer which is not supported for qat yet)

I have one more question now! How can I follow the steps introduced in the tensorflow page? https://www.tensorflow.org/model_optimization/guide/quantization/training_example

The steps :

  1. Train model (no quantization related)
  2. Fine tune with quantization aware training for just an epoch
  3. Convert it to tflite

Is it possible to fine tune model with qat by quantizing models recursively ?

` base_model = tf.keras.applications.VGG19(input_shape=IMG_SHAPE, include_top=False, weights='imagenet')

head_model = tf.keras.Sequential([ tf.keras.layers.InputLayer(input_shape=(5, 5, 512)), tf.keras.layers.GlobalAveragePooling2D(), tf.keras.layers.Dense(1) ])

import tensorflow_model_optimization as tfmot quantize_model = tfmot.quantization.keras.quantize_model

q_base_model = quantize_model(base_model) q_head_model = quantize_model(head_model)

original_inputs = tf.keras.Input(IMG_SHAPE) output1 = q_base_model(original_inputs) output2 = q_head_model(output1)

q_aware_model = tf.keras.Model(inputs=original_inputs, outputs=output2)

base_learning_rate = 0.0001 q_aware_model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=base_learning_rate), loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), metrics=['accuracy'])

initial_epochs = 1 validation_steps=20

history = q_aware_model.fit(train_batches, epochs=initial_epochs )

converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model) converter.optimizations = [tf.lite.Optimize.DEFAULT] quantized_tflite_model = converter.convert() _, quant_file = tempfile.mkstemp('.tflite') with open(quant_file, 'wb') as f: f.write(quantized_tflite_model) print("Quantized model in Mb:", os.path.getsize(quant_file) / float(2**20)) `

Kyle719 avatar Jun 29 '20 08:06 Kyle719

Hi @Xhark, can you comment if nested model is supported now?

teijeong avatar Apr 14 '21 10:04 teijeong

We don't support fully recursively, but now you can apply quantize the model contains sub-model.

e.g) q_base_model = quantize_model(base_model)

original_inputs = tf.keras.Input(IMG_SHAPE) x = q_base_model(original_inputs) x = tf.keras.layers.GlobalAveragePooling2D()(x) output=tf.keras.layers.Dense(1)(x)

model = tf.keras.Model(inputs=original_inputs, outputs=output)

q_aware_model = quantize_model(q_base_model)

--

This example was not supported before, but it works now.

Xhark avatar Apr 16 '21 06:04 Xhark

Thanks @Xhark.

Seems to me the last line q_aware_model = quantize_model(q_base_model) is not needed. q_base_model is already quantized, right?

nutsiepully avatar Apr 16 '21 23:04 nutsiepully

q_base_model is already quantized, but last line is needed to quantize outside of the q_base_model. (GAP and Dense)

Xhark avatar Dec 03 '21 00:12 Xhark

This works for me, may be it is useful for someone!

def create_quantization_model(model):
  layers = []
  for i in range(len(model.layers)):
    if isinstance(model.layers[i], tf.keras.models.Model):
      quant_sub_model = tf.keras.models.clone_model(model.layers[i], clone_function= apply_quantization)
      layers.append(tfmot.quantization.keras.quantize_apply(quant_sub_model))
    else:
      layers.append(apply_quantization(model.layers[i]))
  quant_model = tf.keras.models.Sequential(layers)
  return quant_model
def apply_quantization(layer):
  if isinstance(layer, tf.keras.layers.Dense):
    return tfmot.quantization.keras.quantize_annotate_layer(layer)
  return layer

aqibsaeed avatar Jan 12 '22 10:01 aqibsaeed

Any tips on quantizing the Pix2Pix generator? I've used this official tutorial as a guide, and have attempted the following to no avail:

def downsample(filters, size, apply_batchnorm=True):
  initializer = tf.random_normal_initializer(0., 0.02)

  result = tf.keras.Sequential()
  result.add(
      tfmot.quantization.keras.quantize_annotate_layer(
        tf.keras.layers.Conv2D(filters, size, strides=2, padding='same',
                             kernel_initializer=initializer, use_bias=False)
      ) 
    )

  if apply_batchnorm:
    result.add(
      tfmot.quantization.keras.quantize_annotate_layer(
        tf.keras.layers.BatchNormalization()
      )
    )

  result.add(
      tfmot.quantization.keras.quantize_annotate_layer(
        tf.keras.layers.LeakyReLU()
      )
  )

  return result
  def upsample(filters, size, apply_dropout=False):
  initializer = tf.random_normal_initializer(0., 0.02)

  result = tf.keras.Sequential()
  result.add(
    tfmot.quantization.keras.quantize_annotate_layer(
      tf.keras.layers.Conv2DTranspose(filters, size, strides=2,
                                      padding='same',
                                      kernel_initializer=initializer,
                                      use_bias=False)
      )
  )

  result.add(
      tfmot.quantization.keras.quantize_annotate_layer(
          tf.keras.layers.BatchNormalization()
      )
  )

  if apply_dropout:
      result.add(
        tfmot.quantization.keras.quantize_annotate_layer(
          tf.keras.layers.Dropout(0.5)
        )
      )

  result.add(
    tfmot.quantization.keras.quantize_annotate_layer(
      tf.keras.layers.ReLU()
    )
  )

  return result
def Generator():
  inputs = tf.keras.layers.Input(shape=[512, 512, 3]) # Old: 256

  down_stack = [
    downsample(128, 4, apply_batchnorm=False),  # (batch_size, 128, 128, 64)
    downsample(256, 4),  # (batch_size, 64, 64, 128)
    downsample(512, 4),  # (batch_size, 32, 32, 256)
    downsample(1024, 4),  # (batch_size, 16, 16, 512)
    downsample(1024, 4),  # (batch_size, 8, 8, 512)
    downsample(1024, 4),  # (batch_size, 4, 4, 512)
    downsample(1024, 4),  # (batch_size, 2, 2, 512)
    downsample(1024, 4),  # (batch_size, 1, 1, 512)
  ]

  up_stack = [
    upsample(1024, 4, apply_dropout=True),  # (batch_size, 2, 2, 1024)
    upsample(1024, 4, apply_dropout=True),  # (batch_size, 4, 4, 1024)
    upsample(1024, 4, apply_dropout=True),  # (batch_size, 8, 8, 1024)
    upsample(1024, 4),  # (batch_size, 16, 16, 1024)
    upsample(512, 4),  # (batch_size, 32, 32, 512)
    upsample(256, 4),  # (batch_size, 64, 64, 256)
    upsample(128, 4),  # (batch_size, 128, 128, 128)
  ]

  initializer = tf.random_normal_initializer(0., 0.02)
  last = tf.keras.layers.Conv2DTranspose(OUTPUT_CHANNELS, 4,
                                         strides=2,
                                         padding='same',
                                         kernel_initializer=initializer,
                                         activation='tanh')  # (batch_size, 256, 256, 3)

  x = inputs

  # Downsampling through the model
  skips = []
  for down in down_stack:
    x = down(x)
    skips.append(x)

  skips = reversed(skips[:-1])

  # Upsampling and establishing the skip connections
  for up, skip in zip(up_stack, skips):
    x = up(x)
    x = tf.keras.layers.Concatenate()([x, skip])

  x = last(x)

  # Model
  model = tf.keras.Model(inputs=inputs, outputs=x)

  # Quantize
  q_model = tfmot.quantization.keras.quantize_apply(model)

  return q_model

This current setup gives me the error "ValueError: model must contain at least one layer which have been annotated with quantize_annotate*. There are no layers to quantize." Then when I quantize_apply on the Sequential models in the up/downsample functions the error changes to "ValueError: model must be a built model. been built yet. Please call model.build(input_shape) before quantizing your model" (which makes sense). Is it possible to quantize with this model structure? Thanks in advance!

frytoli avatar Apr 06 '22 16:04 frytoli

Can you try creating your model without any quantization first? Then call: q_model = tf.keras.models.clone_model(model, clone_function= apply_quantization), where apply_quantization should annotate every layer you want quantize with tfmot.quantization.keras.quantize_annotate_layer.

aqibsaeed avatar Apr 06 '22 17:04 aqibsaeed

Thanks for the quick response! That doesn't throw an error, but it doesn't look like it quantizes the layers created within the upsample and downsample functions. Is there any way to also get those layers?

Model: "model_2"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_3 (InputLayer)           [(None, 512, 512, 3  0           []                               
                                )]                                                                
                                                                                                  
 sequential_34 (Sequential)     (None, 256, 256, 12  6144        ['input_3[0][0]']                
                                8)                                                                
                                                                                                  
 sequential_35 (Sequential)     (None, 128, 128, 25  525312      ['sequential_34[1][0]']          
                                6)                                                                
                                                                                                  
 sequential_36 (Sequential)     (None, 64, 64, 512)  2099200     ['sequential_35[1][0]']          
                                                                                                  
 sequential_37 (Sequential)     (None, 32, 32, 1024  8392704     ['sequential_36[1][0]']          
                                )                                                                 
                                                                                                  
 sequential_38 (Sequential)     (None, 16, 16, 1024  16781312    ['sequential_37[1][0]']          
                                )                                                                 
                                                                                                  
 sequential_39 (Sequential)     (None, 8, 8, 1024)   16781312    ['sequential_38[1][0]']          
                                                                                                  
 sequential_40 (Sequential)     (None, 4, 4, 1024)   16781312    ['sequential_39[1][0]']          
                                                                                                  
 sequential_41 (Sequential)     (None, 2, 2, 1024)   16781312    ['sequential_40[1][0]']          
                                                                                                  
 sequential_42 (Sequential)     (None, 4, 4, 1024)   16781312    ['sequential_41[1][0]']          
                                                                                                  
 concatenate_14 (Concatenate)   (None, 4, 4, 2048)   0           ['sequential_42[1][0]',          
                                                                  'sequential_40[1][0]']          
                                                                                                  
 sequential_43 (Sequential)     (None, 8, 8, 1024)   33558528    ['concatenate_14[1][0]']         
                                                                                                  
 concatenate_15 (Concatenate)   (None, 8, 8, 2048)   0           ['sequential_43[1][0]',          
                                                                  'sequential_39[1][0]']          
                                                                                                  
 sequential_44 (Sequential)     (None, 16, 16, 1024  33558528    ['concatenate_15[1][0]']         
                                )                                                                 
                                                                                                  
 concatenate_16 (Concatenate)   (None, 16, 16, 2048  0           ['sequential_44[1][0]',          
                                )                                 'sequential_38[1][0]']          
                                                                                                  
 sequential_45 (Sequential)     (None, 32, 32, 1024  33558528    ['concatenate_16[1][0]']         
                                )                                                                 
                                                                                                  
 concatenate_17 (Concatenate)   (None, 32, 32, 2048  0           ['sequential_45[1][0]',          
                                )                                 'sequential_37[1][0]']          
                                                                                                  
 sequential_46 (Sequential)     (None, 64, 64, 512)  16779264    ['concatenate_17[1][0]']         
                                                                                                  
 concatenate_18 (Concatenate)   (None, 64, 64, 1024  0           ['sequential_46[1][0]',          
                                )                                 'sequential_36[1][0]']          
                                                                                                  
 sequential_47 (Sequential)     (None, 128, 128, 25  4195328     ['concatenate_18[1][0]']         
                                6)                                                                
                                                                                                  
 concatenate_19 (Concatenate)   (None, 128, 128, 51  0           ['sequential_47[1][0]',          
                                2)                                'sequential_35[1][0]']          
                                                                                                  
 sequential_48 (Sequential)     (None, 256, 256, 12  1049088     ['concatenate_19[1][0]']         
                                8)                                                                
                                                                                                  
 concatenate_20 (Concatenate)   (None, 256, 256, 25  0           ['sequential_48[1][0]',          
                                6)                                'sequential_34[1][0]']          
                                                                                                  
 quantize_annotate_28 (Quantize  (None, 512, 512, 3)  12291      ['concatenate_20[1][0]']         
 Annotate)                                                                                        
                                                                                                  
==================================================================================================
Total params: 217,641,475
Trainable params: 217,619,715
Non-trainable params: 21,760
__________________________________________________________________________________________________

frytoli avatar Apr 06 '22 18:04 frytoli

I think quantization does not really go recursively for models that contains other models (in your case main model contains other sequential models). Did you try passing your model to create_quantization_model(model) function mentioned here? I think the solution would be to iterate over model layers, if you encounter sequential model then iterate over its layers too , to annotate it.

aqibsaeed avatar Apr 06 '22 18:04 aqibsaeed

I did manage to get that working for me with some additional layers in the apply_quantization function (I'm still learning here!). But, I receive the following error, I think due to the Concatenation layers between the sub-models:

ValueError: A merge layer should be called on a list of inputs. Received: inputs=Tensor("Placeholder:0", shape=(None, 4, 4, 1024), dtype=float32) (not a list of tensors)

I've also tested changing quant_model = tf.keras.Sequential(layers) to quant_model = tf.keras.Model(layers) in apply_quantization and it runs without issue. However, then when I call and attempt to view the new quantized model's summary like this q_model(inputs=inputs), I receive this error:

Unimplemented `tf.keras.Model.call()`: if you intend to create a `Model` with the Functional API, please provide `inputs` and `outputs` arguments. Otherwise, subclass `Model` with an overridden `call()` method.

Thanks again for your help.

frytoli avatar Apr 06 '22 20:04 frytoli

Great. Check this to implement call: https://keras.io/guides/customizing_what_happens_in_fit/ There is GAN example at the end of the page, that would be useful.

aqibsaeed avatar Apr 06 '22 20:04 aqibsaeed

Wonderful! Thanks again!

frytoli avatar Apr 07 '22 00:04 frytoli

additional

HI @frytoli, what changes did you make in apply quantisation function to apply quantisation to submodules (conv layers) of upsample and downsample blocks too? I am also facing a similar issue in my problem

ashwinv99 avatar Jun 27 '22 19:06 ashwinv99