Learning-to-See-in-the-Dark icon indicating copy to clipboard operation
Learning-to-See-in-the-Dark copied to clipboard

Variable/Adam not found in checkpoint error when trying to restore the pre-trained model and finetune the network

Open Melody-doudou opened this issue 5 years ago • 2 comments

I am trying to restore the pre-trained model and finetune the network. However it raised an error as follows when restoring the pre-trained model:


Key Variable/Adam not found in checkpoint
	 [[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

If I put the code G_opt = tf.train.AdamOptimizer(learning_rate=lr).minimize(G_loss) after saver = tf.train.Saver() and saver.restore(sess, ckpt.model_checkpoint_path), the model could be restored succefully, though it raised another error when running _, G_current, output = sess.run([G_opt, G_loss, out_image], feed_dict={in_image: input_patch, gt_image: gt_patch, lr: learning_rate}) as follows:

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value beta1_power
	 [[Node: beta1_power/read = Identity[T=DT_FLOAT, _class=["loc:@Adam/Assign_1"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](beta1_power)]]

Seems like the error is related to the variables within the AdamOptimizer. Any suggestions about it?

Melody-doudou avatar Feb 19 '19 23:02 Melody-doudou

I do not have a machine to run the code now. Can you try to remove Adam variables from the saver?

For example, saver = tf.train.Saver(t_vars)

Let me know if the problem is solved.

cchen156 avatar Feb 20 '19 00:02 cchen156

Hi, thanks for the reply. I tried to remove Adam variables from the saver and it's not raising any errors now. However I printed out all the variables saved in the checkpoint file, and they are listed as follows:

('Variable', [2, 2, 256, 512])
('Variable_1', [2, 2, 128, 256])
('Variable_2', [2, 2, 64, 128])
('Variable_3', [2, 2, 32, 64])
('beta1_power', [])
('beta2_power', [])
('g_conv10/biases', [12])
('g_conv10/biases/Adam', [12])
('g_conv10/biases/Adam_1', [12])
('g_conv10/weights', [1, 1, 32, 12])
('g_conv10/weights/Adam', [1, 1, 32, 12])
('g_conv10/weights/Adam_1', [1, 1, 32, 12])
('g_conv1_1/biases', [32])
('g_conv1_1/biases/Adam', [32])
('g_conv1_1/biases/Adam_1', [32])
('g_conv1_1/weights', [3, 3, 4, 32])
('g_conv1_1/weights/Adam', [3, 3, 4, 32])
('g_conv1_1/weights/Adam_1', [3, 3, 4, 32])
('g_conv1_2/biases', [32])
('g_conv1_2/biases/Adam', [32])
('g_conv1_2/biases/Adam_1', [32])
('g_conv1_2/weights', [3, 3, 32, 32])
('g_conv1_2/weights/Adam', [3, 3, 32, 32])
('g_conv1_2/weights/Adam_1', [3, 3, 32, 32])
('g_conv2_1/biases', [64])
('g_conv2_1/biases/Adam', [64])
('g_conv2_1/biases/Adam_1', [64])
('g_conv2_1/weights', [3, 3, 32, 64])
('g_conv2_1/weights/Adam', [3, 3, 32, 64])
('g_conv2_1/weights/Adam_1', [3, 3, 32, 64])
('g_conv2_2/biases', [64])
('g_conv2_2/biases/Adam', [64])
('g_conv2_2/biases/Adam_1', [64])
('g_conv2_2/weights', [3, 3, 64, 64])
('g_conv2_2/weights/Adam', [3, 3, 64, 64])
('g_conv2_2/weights/Adam_1', [3, 3, 64, 64])
('g_conv3_1/biases', [128])
('g_conv3_1/biases/Adam', [128])
('g_conv3_1/biases/Adam_1', [128])
('g_conv3_1/weights', [3, 3, 64, 128])
('g_conv3_1/weights/Adam', [3, 3, 64, 128])
('g_conv3_1/weights/Adam_1', [3, 3, 64, 128])
('g_conv3_2/biases', [128])
('g_conv3_2/biases/Adam', [128])
('g_conv3_2/biases/Adam_1', [128])
('g_conv3_2/weights', [3, 3, 128, 128])
('g_conv3_2/weights/Adam', [3, 3, 128, 128])
('g_conv3_2/weights/Adam_1', [3, 3, 128, 128])
('g_conv4_1/biases', [256])
('g_conv4_1/biases/Adam', [256])
('g_conv4_1/biases/Adam_1', [256])
('g_conv4_1/weights', [3, 3, 128, 256])
('g_conv4_1/weights/Adam', [3, 3, 128, 256])
('g_conv4_1/weights/Adam_1', [3, 3, 128, 256])
('g_conv4_2/biases', [256])
('g_conv4_2/biases/Adam', [256])
('g_conv4_2/biases/Adam_1', [256])
('g_conv4_2/weights', [3, 3, 256, 256])
('g_conv4_2/weights/Adam', [3, 3, 256, 256])
('g_conv4_2/weights/Adam_1', [3, 3, 256, 256])
('g_conv5_1/biases', [512])
('g_conv5_1/biases/Adam', [512])
('g_conv5_1/biases/Adam_1', [512])
('g_conv5_1/weights', [3, 3, 256, 512])
('g_conv5_1/weights/Adam', [3, 3, 256, 512])
('g_conv5_1/weights/Adam_1', [3, 3, 256, 512])
('g_conv5_2/biases', [512])
('g_conv5_2/biases/Adam', [512])
('g_conv5_2/biases/Adam_1', [512])
('g_conv5_2/weights', [3, 3, 512, 512])
('g_conv5_2/weights/Adam', [3, 3, 512, 512])
('g_conv5_2/weights/Adam_1', [3, 3, 512, 512])
('g_conv6_1/biases', [256])
('g_conv6_1/biases/Adam', [256])
('g_conv6_1/biases/Adam_1', [256])
('g_conv6_1/weights', [3, 3, 512, 256])
('g_conv6_1/weights/Adam', [3, 3, 512, 256])
('g_conv6_1/weights/Adam_1', [3, 3, 512, 256])
('g_conv6_2/biases', [256])
('g_conv6_2/biases/Adam', [256])
('g_conv6_2/biases/Adam_1', [256])
('g_conv6_2/weights', [3, 3, 256, 256])
('g_conv6_2/weights/Adam', [3, 3, 256, 256])
('g_conv6_2/weights/Adam_1', [3, 3, 256, 256])
('g_conv7_1/biases', [128])
('g_conv7_1/biases/Adam', [128])
('g_conv7_1/biases/Adam_1', [128])
('g_conv7_1/weights', [3, 3, 256, 128])
('g_conv7_1/weights/Adam', [3, 3, 256, 128])
('g_conv7_1/weights/Adam_1', [3, 3, 256, 128])
('g_conv7_2/biases', [128])
('g_conv7_2/biases/Adam', [128])
('g_conv7_2/biases/Adam_1', [128])
('g_conv7_2/weights', [3, 3, 128, 128])
('g_conv7_2/weights/Adam', [3, 3, 128, 128])
('g_conv7_2/weights/Adam_1', [3, 3, 128, 128])
('g_conv8_1/biases', [64])
('g_conv8_1/biases/Adam', [64])
('g_conv8_1/biases/Adam_1', [64])
('g_conv8_1/weights', [3, 3, 128, 64])
('g_conv8_1/weights/Adam', [3, 3, 128, 64])
('g_conv8_1/weights/Adam_1', [3, 3, 128, 64])
('g_conv8_2/biases', [64])
('g_conv8_2/biases/Adam', [64])
('g_conv8_2/biases/Adam_1', [64])
('g_conv8_2/weights', [3, 3, 64, 64])
('g_conv8_2/weights/Adam', [3, 3, 64, 64])
('g_conv8_2/weights/Adam_1', [3, 3, 64, 64])
('g_conv9_1/biases', [32])
('g_conv9_1/biases/Adam', [32])
('g_conv9_1/biases/Adam_1', [32])
('g_conv9_1/weights', [3, 3, 64, 32])
('g_conv9_1/weights/Adam', [3, 3, 64, 32])
('g_conv9_1/weights/Adam_1', [3, 3, 64, 32])
('g_conv9_2/biases', [32])
('g_conv9_2/biases/Adam', [32])
('g_conv9_2/biases/Adam_1', [32])
('g_conv9_2/weights', [3, 3, 32, 32])
('g_conv9_2/weights/Adam', [3, 3, 32, 32])
('g_conv9_2/weights/Adam_1', [3, 3, 32, 32])

I observed that there are only four variables which are ('Variable') ('Variable_1') ('Variable_2') ('Variable_3') that don't have corresponding /Adam and /Adam_1. So I am thinking is it because this checkpoint is saved without these transposed convolution weights being updated with Adam optimizer? And I am not sure if I just remove the Adam variables from the saver, would that lead to a not very well weights point as it was originally pre-trained?

Also I have another question about the usage of GPU. Do I need to modify the code and specify the usage of GPU device in order to use it? I basically just downloaded the code and ran it, and it's taking a long time up to over 40 hours. And I am not quite sure if I am actually using GPU or not...? Thanks!

Melody-doudou avatar Feb 20 '19 04:02 Melody-doudou