MZSR icon indicating copy to clipboard operation
MZSR copied to clipboard

What is the usage of SECOND_ORDER_GRAD_ITER=0 and self.total_loss1?

Open qhanson opened this issue 4 years ago • 3 comments

What is the usage of SECOND_ORDER_GRAD_ITER=0 and self.total_loss1? As for SECOND_ORDER_GRAD_ITER=0:

if step == SECOND_ORDER_GRAD_ITER:
       second_grad=sess.run(self.second_grad_on)

If we have finished pre-training on large scale datasets, I think it is useless in this meta-transfer learning step. As for self.total_loss1:

self.total_loss1 = tf.reduce_sum(self.lossesa) / tf.to_float(self.META_BATCH_SIZE)
self.pretrain_op = tf.train.AdamOptimizer(self.META_LR).minimize(self.total_loss1)
        
self.gvs = self.opt.compute_gradients(self.weighted_total_losses2)
self.metatrain_op= self.opt.apply_gradients(self.gvs)

sess.run(self.metatrain_op, feed_dict=feed_dict)

In this meta-transfer learning step, total_loss1 is never used for optimizers. Is it correct?

qhanson avatar Jun 02 '20 21:06 qhanson

MAML requires 2nd-order gradients, which requires large computation. For the fast training, we use 1st-order approximation of the gradients at the beginning of the training, and then after SECOND_ORDER_GRAD_ITER, we use the full gradients.

Therefore, SECOND_ORDER_GRAD_ITER is to decide how many steps to approximate the gradients within 1st-order.

For the self.total_loss1, you are right. it is not used for the training. You may ignore that loss and corresponding optimizer.

JWSoh avatar Jun 08 '20 11:06 JWSoh

Dear Sir, Amazing work ! Congratulation!! please , I have a question.can you kindly provide me with the full path I should insert of checkpoint the trained large scale training model to be able to use it as a pre-trained to meta transfer training? as it says that there is no check point file I'm waiting for your reply. Thanks in advance

BassantTolba1234 avatar Dec 14 '20 20:12 BassantTolba1234

Please can you kindly explain me how to calculate this weight loss ?

def get_loss_weights(self): loss_weights = tf.ones(shape=[self.TASK_ITER]) * (1.0/self.TASK_ITER) decay_rate = 1.0 / self.TASK_ITER / (10000 / 3) min_value= 0.03 / self.TASK_ITER

    loss_weights_pre = tf.maximum(loss_weights[:-1] - (tf.multiply(tf.to_float(self.global_step), decay_rate)), min_value)

    loss_weight_cur= tf.minimum(loss_weights[-1] + (tf.multiply(tf.to_float(self.global_step),(self.TASK_ITER- 1) * decay_rate)), 1.0 - ((self.TASK_ITER - 1) * min_value))
    loss_weights = tf.concat([[loss_weights_pre], [[loss_weight_cur]]], axis=1)
    return loss_weights

BassantTolba1234 avatar Jan 06 '21 10:01 BassantTolba1234