tensorflow-maml icon indicating copy to clipboard operation
tensorflow-maml copied to clipboard

How to apply several SGD steps within the ineer loop?

Open davidjimenezphd opened this issue 4 years ago • 13 comments

Hi @mari-linhares , thanks for the repo! We are building on your code to implement a bit more general version of MAML that includes a batch of tasks within the inner loop and several steps of gradient descent wrt the parameters of each task. However, we are stuck in how to add several steps of SGD within your code using tensorflow 2.0. Do you have any idea of how to do that?

davidjimenezphd avatar Sep 20 '19 08:09 davidjimenezphd

I've also been trying to build off this repo, but have encountered the same issue. It seems that updating the weights manually as done here makes them non-trainable. @davidjimenezphd Have you found a workaround? Without multiple inner loop SGD steps, this repo doesn't actually run the full version of MAML.

Alekxos avatar Dec 02 '19 01:12 Alekxos

Hi Alekxos. Yes we find a solution based on "watch"-ing some variables in the gradient tape. Give me some time, and I'll try to upload the solution.

davidjimenezphd avatar Dec 02 '19 07:12 davidjimenezphd

It's definitely a bug in Tensorflow. We worked around it by doing following:

  • build copy of meta-network and train a step (inner training), now the weights of the copy are not trainable (N=1) Here is where the patch begins:
  • make a new copy of meta model and initialize the new copy
  • manually set the weights (you need to manually iterate through the layers) of the new copy with the weights from the trained copy, now you can use this copy and train it again (N>1). You need to repeat this for every training step...

This is a bit hacky and needs some extra calculation (for copying and forwarding through the net, but Tensorflow has so many open issues, that we use this as long the bug exists ;-) - and I think it will be there for a while..)

See our Tensorflow issue: https://github.com/tensorflow/tensorflow/issues/34335

shufflebyte avatar Dec 02 '19 10:12 shufflebyte

Hi @shufflebyte

This is actually not a tensorflow bug.

def copy_model(model, x):
    copied_model = MetaModel()
    copied_model.forward(x)
    copied_model.set_weights(model.get_weights())
    return copied_model

In this function, Model.get_weights actually returns some numpy arrays, and Model.set_weights is used to overwrite weight values from numpy arrays rather than replace the trainable variables with another set of variables. Therefore, in effect, this function does not copy a model as we expect.

This is not problematic in this repo because we do manual replacement:

                k = 0
                model_copy = copy_model(model, x)
                for j in range(len(model_copy.layers)):
                    model_copy.layers[j].kernel = tf.subtract(model.layers[j].kernel,
                                tf.multiply(lr_inner, gradients[k]))
                    model_copy.layers[j].bias = tf.subtract(model.layers[j].bias,
                                tf.multiply(lr_inner, gradients[k+1]))
                    k += 2

llan-ml avatar Feb 21 '20 18:02 llan-ml

Hi @llan-ml I have been stuck in this issue some while. Surely we can update the parameters of copied model manually, but if we need to add several steps of SGD in our inner loop to update the copied model serveral times, we need to compute the gradients on the copied model. But there is no trainable variables in the copied model, GradientTape cannot compute the gradients.

Actually i tried to directly apply a tf.keras.optimizers.SGD() for updating the fast weights, this can keep the variables in the copied model trainable.

HilbertXu avatar Feb 24 '20 09:02 HilbertXu

Hi @davidjimenezphd

Have you found out how to add batch and serveral SGD steps?

I have ben stuck in this problem some days, I tried to use two tapes to watch the whole batch process, and use stop_recording() function during the batch process to control it. It seems i can add several SGD steps to update fast weights several times. But i failed to compute the gradients of the whole batch, it returns a list of None. Could you please tell me how do you solve this problem.

HilbertXu avatar Feb 24 '20 09:02 HilbertXu

Hi @HilbertXu

In the case of multiple inner gradient steps, you need to manually watch the weight tensors (they are already not tf.Variable) and the tape could compute their gradients.

llan-ml avatar Feb 24 '20 10:02 llan-ml

Hi llan-ml

Thanks for your help, i will try it later

HilbertXu avatar Feb 24 '20 10:02 HilbertXu

Hi @HilbertXu

I wrote a toy MAML-like script, which may be helpful for you. Please let me know if you find that the implementation is correct and works in more practical situations.

llan-ml avatar Feb 24 '20 13:02 llan-ml

Hi @llan-ml

It shows that i dont have the access to your files. Could u please help me with this?

Maybe we can chat on wechat or email? My ss server has been blocked so it's hard for me to access to the colab.

HilbertXu avatar Feb 24 '20 14:02 HilbertXu

I forgot to enable sharing of that link, and now it should be accessible. Also, feel free to access me by email in my profile.

llan-ml avatar Feb 24 '20 15:02 llan-ml

Hi @shufflebyte

This is actually not a tensorflow bug.

def copy_model(model, x):
    copied_model = MetaModel()
    copied_model.forward(x)
    copied_model.set_weights(model.get_weights())
    return copied_model

In this function, Model.get_weights actually returns some numpy arrays, and Model.set_weights is used to overwrite weight values from numpy arrays rather than replace the trainable variables with another set of variables. Therefore, in effect, this function does not copy a model as we expect.

This is not problematic in this repo because we do manual replacement:

                k = 0
                model_copy = copy_model(model, x)
                for j in range(len(model_copy.layers)):
                    model_copy.layers[j].kernel = tf.subtract(model.layers[j].kernel,
                                tf.multiply(lr_inner, gradients[k]))
                    model_copy.layers[j].bias = tf.subtract(model.layers[j].bias,
                                tf.multiply(lr_inner, gradients[k+1]))
                    k += 2

But I also have error, why model.get_weight() return empty list.

        with tf.GradientTape() as support_tape:
            support_tape.watch(model.trainable_variables)
            y_pred = model.forward(x1[i])
            support_loss = compute_loss(y1, y_pred)

        gradients = support_tape.gradient(support_loss, model.trainable_variables)
        # inner_optimizer.apply_gradients(zip(gradients, model.trainable_variables))
        k = 0
        for j in range(len(model.layers)):
            model.layers[j].kernel = tf.subtract(model.layers[j].kernel, tf.multiply(lr_inner, gradients[k]))
            model.layers[j].bias = tf.subtract(model.layers[j].bias, tf.multiply(lr_inner, gradients[k + 1]))
            k += 2
        print(model.get_weights())`

Runist avatar Jul 15 '20 04:07 Runist

I forgot to enable sharing of that link, and now it should be accessible. Also, feel free to access me by email in my profile.

with tf.GradientTape() as outer_tape:
  copied_model = model
  for _ in range(2):
    with tf.GradientTape(watch_accessed_variables=False) as inner_tape:
      inner_tape.watch(copied_model.inner_weights)
      inner_loss = compute_loss(copied_model, x, y)
    inner_grads = inner_tape.gradient(inner_loss, copied_model.inner_weights)
    # print(inner_grads)
    # print("================")
    copied_model = MetaModel.copy_from(copied_model, inner_grads)
  outer_loss = compute_loss(copied_model, x, y)
outer_grads = outer_tape.gradient(outer_loss, model.inner_weights)
optimizer.apply_gradients(zip(outer_grads, model.inner_weights))

And I try your code, the model and copied_model is the same object.When you update copied_model,it also update model.

Runist avatar Jul 15 '20 05:07 Runist