SPADE-Tensorflow icon indicating copy to clipboard operation
SPADE-Tensorflow copied to clipboard

Unable to run pretrained celebA hinge checkpoint

Open blazm opened this issue 5 years ago • 4 comments

Hi, I am unable to reproduce prediction or training using existing celebA hinge checkpoint.

Here is the stack trace of calling random test with pretrained checkpoint:

python main.py --dataset spade_celebA --segmap_ch 3 --phase random

Everything is runs okay till reading the checkpoints. But then some shapes are reported as mismatched and I am not able to figure out what could the problem be. I am using TensorFlow 1.14. Do I need to use an older TF version for this to work or are there some additional code modifications that were performed after saving the pretrained checkpoint?

UPDATE: I resolved mismatching shapes lhs shape= [5,5,16,128] rhs shape= [5,5,19,128] as described below. Now new mismatch is lhs shape= [32768,256] rhs shape= [8192,256]

[*] Reading checkpoints... W1017 12:37:21.855408 139936319534848 deprecation.py:323] From /home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. Traceback (most recent call last): File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [5,5,16,128] rhs shape= [5,5,19,128] [[{{node save/Assign_623}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1286, in restore {self.saver_def.filename_tensor_name: save_path}) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 950, in run run_metadata_ptr) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1173, in _run feed_dict_tensor, options, run_metadata) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run run_metadata) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [5,5,16,128] rhs shape= [5,5,19,128] [[node save/Assign_623 (defined at /home/blaz/github/SPADE-Tensorflow/SPADE.py:537) ]]

Errors may have originated from an input operation. Input Source operations connected to node save/Assign_623: generator/spade_resblock_fix_2/spade_2/conv_128/conv2d/kernel/Adam (defined at /home/blaz/github/SPADE-Tensorflow/SPADE.py:383)

Original stack trace for 'save/Assign_623': File "main.py", line 125, in main() File "main.py", line 116, in main gan.random_test() File "/home/blaz/github/SPADE-Tensorflow/SPADE.py", line 537, in random_test self.saver = tf.train.Saver() File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 825, in init self.build() File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 837, in build self._build(self._filename, build_save=True, build_restore=True) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 875, in _build build_restore=build_restore) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal restore_sequentially, reshape) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 350, in _AddRestoreOps assign_ops.append(saveable.restore(saveable_tensors, shapes)) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 72, in restore self.op.get_shape().is_fully_defined()) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/ops/state_ops.py", line 227, in assign validate_shape=validate_shape) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/ops/gen_state_ops.py", line 66, in assign use_locking=use_locking, name=name) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op op_def=op_def) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2005, in init self._traceback = tf_stack.extract_stack()

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "main.py", line 125, in main() File "main.py", line 116, in main gan.random_test() File "/home/blaz/github/SPADE-Tensorflow/SPADE.py", line 538, in random_test could_load, checkpoint_counter = self.load(self.checkpoint_dir) File "/home/blaz/github/SPADE-Tensorflow/SPADE.py", line 524, in load self.saver.restore(self.sess, os.path.join(checkpoint_dir, ckpt_name)) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1322, in restore err, "a mismatch between the current graph and the graph") tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [5,5,16,128] rhs shape= [5,5,19,128] [[node save/Assign_623 (defined at /home/blaz/github/SPADE-Tensorflow/SPADE.py:537) ]]

Errors may have originated from an input operation. Input Source operations connected to node save/Assign_623: generator/spade_resblock_fix_2/spade_2/conv_128/conv2d/kernel/Adam (defined at /home/blaz/github/SPADE-Tensorflow/SPADE.py:383)

Original stack trace for 'save/Assign_623': File "main.py", line 125, in main() File "main.py", line 116, in main gan.random_test() File "/home/blaz/github/SPADE-Tensorflow/SPADE.py", line 537, in random_test self.saver = tf.train.Saver() File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 825, in init self.build() File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 837, in build self._build(self._filename, build_save=True, build_restore=True) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 875, in _build build_restore=build_restore) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal restore_sequentially, reshape) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 350, in _AddRestoreOps assign_ops.append(saveable.restore(saveable_tensors, shapes)) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 72, in restore self.op.get_shape().is_fully_defined()) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/ops/state_ops.py", line 227, in assign validate_shape=validate_shape) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/ops/gen_state_ops.py", line 66, in assign use_locking=use_locking, name=name) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op op_def=op_def) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2005, in init self._traceback = tf_stack.extract_stack()

UPDATE: I resolved this issue by reproducing CelebAHQ masks with 19 segmentation labels (instead of 16 labels as originally defined in spade_celebA\segmap_label.txt).

Now I get the following error (with mismatched dimensions 32768 and 8192):

[*] Reading checkpoints... W1017 13:40:56.285303 139846562793216 deprecation.py:323] From /home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. Traceback (most recent call last): File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [32768,256] rhs shape= [8192,256] [[{{node save/Assign_185}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1286, in restore {self.saver_def.filename_tensor_name: save_path}) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 950, in run run_metadata_ptr) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1173, in _run feed_dict_tensor, options, run_metadata) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run run_metadata) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [32768,256] rhs shape= [8192,256] [[node save/Assign_185 (defined at /home/blaz/github/SPADE-Tensorflow/SPADE.py:539) ]]

Errors may have originated from an input operation. Input Source operations connected to node save/Assign_185: encoder/linear_var/kernel/Adam_1 (defined at /home/blaz/github/SPADE-Tensorflow/SPADE.py:385)

Original stack trace for 'save/Assign_185': File "main.py", line 125, in main() File "main.py", line 116, in main gan.random_test() File "/home/blaz/github/SPADE-Tensorflow/SPADE.py", line 539, in random_test self.saver = tf.train.Saver() File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 825, in init self.build() File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 837, in build self._build(self._filename, build_save=True, build_restore=True) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 875, in _build build_restore=build_restore) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal restore_sequentially, reshape) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 350, in _AddRestoreOps assign_ops.append(saveable.restore(saveable_tensors, shapes)) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 72, in restore self.op.get_shape().is_fully_defined()) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/ops/state_ops.py", line 227, in assign validate_shape=validate_shape) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/ops/gen_state_ops.py", line 66, in assign use_locking=use_locking, name=name) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op op_def=op_def) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2005, in init self._traceback = tf_stack.extract_stack()

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "main.py", line 125, in main() File "main.py", line 116, in main gan.random_test() File "/home/blaz/github/SPADE-Tensorflow/SPADE.py", line 540, in random_test could_load, checkpoint_counter = self.load(self.checkpoint_dir) File "/home/blaz/github/SPADE-Tensorflow/SPADE.py", line 526, in load self.saver.restore(self.sess, os.path.join(checkpoint_dir, ckpt_name)) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1322, in restore err, "a mismatch between the current graph and the graph") tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [32768,256] rhs shape= [8192,256] [[node save/Assign_185 (defined at /home/blaz/github/SPADE-Tensorflow/SPADE.py:539) ]]

Errors may have originated from an input operation. Input Source operations connected to node save/Assign_185: encoder/linear_var/kernel/Adam_1 (defined at /home/blaz/github/SPADE-Tensorflow/SPADE.py:385)

Original stack trace for 'save/Assign_185': File "main.py", line 125, in main() File "main.py", line 116, in main gan.random_test() File "/home/blaz/github/SPADE-Tensorflow/SPADE.py", line 539, in random_test self.saver = tf.train.Saver() File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 825, in init self.build() File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 837, in build self._build(self._filename, build_save=True, build_restore=True) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 875, in _build build_restore=build_restore) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal restore_sequentially, reshape) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 350, in _AddRestoreOps assign_ops.append(saveable.restore(saveable_tensors, shapes)) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 72, in restore self.op.get_shape().is_fully_defined()) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/ops/state_ops.py", line 227, in assign validate_shape=validate_shape) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/ops/gen_state_ops.py", line 66, in assign use_locking=use_locking, name=name) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op op_def=op_def) File "/home/blaz/anaconda2/envs/tf1.14/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2005, in init self._traceback = tf_stack.extract_stack()

Any ideas?

Thank you!

blazm avatar Oct 17 '19 11:10 blazm

you can change the contents of the segmap_label.txt to {(0, 0, 0): 0, (0, 0, 255): 1, (255, 0, 0): 2, (150, 30, 150): 3, (255, 65, 255): 4, (150, 80, 0): 5, (170, 120, 65): 6, (125, 125, 125): 7, (255, 255, 0): 8, (0, 255, 255): 9, (255, 150, 0): 10, (255, 225, 120): 11, (255, 125, 125): 12, (200, 100, 100): 13, (0, 255, 0): 14, (0, 150, 80): 15, (215, 175, 125): 16, (220, 180, 210): 17, (125, 125, 255): 18} The reason is that the author source code only contains 16 classes.

eleven-0325 avatar Oct 18 '19 06:10 eleven-0325

you can change the contents of the segmap_label.txt to {(0, 0, 0): 0, (0, 0, 255): 1, (255, 0, 0): 2, (150, 30, 150): 3, (255, 65, 255): 4, (150, 80, 0): 5, (170, 120, 65): 6, (125, 125, 125): 7, (255, 255, 0): 8, (0, 255, 255): 9, (255, 150, 0): 10, (255, 225, 120): 11, (255, 125, 125): 12, (200, 100, 100): 13, (0, 255, 0): 14, (0, 150, 80): 15, (215, 175, 125): 16, (220, 180, 210): 17, (125, 125, 255): 18} The reason is that the author source code only contains 16 classes.

Hi, thank you for your reply!

I already did that, as described in my update:

UPDATE: I resolved this issue by reproducing CelebAHQ masks with 19 segmentation labels (instead of 16 labels as originally defined in spade_celebA\segmap_label.txt).

Now I get the following error (with mismatched dimensions 32768 and 8192):

[*] Reading checkpoints... W1017 13:40:56.285303 139846562793216 deprecation.py:323] From ...

This is why I am not sure what else could it be.

blazm avatar Oct 28 '19 18:10 blazm

when you change the segmap_label.txt. you can use: Random test

python main.py --dataset spade_celebA --segmap_ch 3 --phase random Guide test python main.py --dataset spade_celebA --img_ch 3 --segmap_ch 3 --phase guide --guide_img ./guide_img.png

but when you train the model. The segmap_label.txt is should Automatically created). I can't have this problem(with mismatched dimensions 32768 and 8192). But you can see if the size of the data set matches, and the size of the image.

eleven-0325 avatar Oct 31 '19 09:10 eleven-0325

Have you solved this, I encounter the same problem about loading the pre-trained checkpoing.

fido20160817 avatar May 02 '22 03:05 fido20160817