WassersteinGAN.tensorflow
WassersteinGAN.tensorflow copied to clipboard
tf.get_variable() error, variable does not exist or was not created
My tensorflow version is 0.12.1
when I run run_main.py, I got this error
"ValueError: Variable discriminator/disc_bn1/discriminator_1/disc_bn1/cond/discriminator_1/disc_bn1/moments/moments_1/mean/ExponentialMovingAverage/biased does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?"
Any one has any idea?
Maybe you could add:
with tf.variable_scope(tf.get_variable_scope(), reuse=False):
before ema.apply
https://github.com/carpedm20/DCGAN-tensorflow/issues/59
This worked for me! (tensorflow 1.0 alpha
This do not worked for me! (tensorflow 1.0 nightly)
Traceback (most recent call last):
File "main.py", line 55, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "main.py", line 44, in main
FLAGS.optimizer_param)
File "/home/long/MyCode2/WassersteinGAN.tensorflow-master/models/GAN_models.py", line 197, in create_network
scope_reuse=True)
File "/home/long/MyCode2/WassersteinGAN.tensorflow-master/models/GAN_models.py", line 118, in _discriminator
h_bn = utils.batch_norm(h_conv, dims[index + 1], train_phase, scope="disc_bn%d" % index)
File "/home/long/MyCode2/WassersteinGAN.tensorflow-master/utils.py", line 145, in batch_norm
lambda: (ema.average(batch_mean), ema.average(batch_var)))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1741, in cond
orig_res, res_t = context_t.BuildCondBranch(fn1)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1642, in BuildCondBranch
r = fn()
File "/home/long/MyCode2/WassersteinGAN.tensorflow-master/utils.py", line 139, in mean_var_with_update
ema_apply_op = ema.apply([batch_mean, batch_var])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 375, in apply
colocate_with_primary=(var.op.type in ["Variable", "VariableV2"]))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 135, in create_zeros_slot
colocate_with_primary=colocate_with_primary)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 112, in create_slot
return _create_slot_var(primary, val, "")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 64, in _create_slot_var
validate_shape=val.get_shape().is_fully_defined())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1033, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 932, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 671, in _get_single_variable
"VarScope?" % name)
ValueError: Variable discriminator/disc_bn1/discriminator_1/disc_bn1/moments/moments_1/mean/ExponentialMovingAverage/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
It spent me two days to figure out the workaround and ends up with failure. It seems the reason is that although the fake_discriminator set the scope_reuse to be True, however, the tf.cond() statement will create a new control_flow every time, such that the get_variable() cannot retrieve the corresponding variables from the real_discriminator and throw a ValueError .../.../discriminator_1/disc_bn1/... blablabla. Bcuz according to my understanding, there shouldn't be a nested scope ../../discriminator_1 and nested ../../../disc_bn1. Tell me if I am wrong. Anyway, I cannot make changes base on the original code. My workaround was to change to tf.contrib.layers.batch_norm(). Done with one statement.
@kunrenzhilu : could you be more specific about how you modify tf.contrib.layers.batch_norm() ? I am struggling with the same problem stated above.
I have the same problem. After adding:
with tf.variable_scope(tf.get_variable_scope(), reuse=False):
before ema.apply
There comes another problem at model.initialize_network(FLAGS.logs_dir):
Traceback (most recent call last):
File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1021, in _do_call
return fn(*args)
File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1003, in _run_fn
status, run_metadata)
File "/home/jg/miniconda3/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype bool
[[Node: Placeholder = Placeholder[dtype=DT_BOOL, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/media/jg/F/20170514/main.py", line 54, in <module>
tf.app.run()
File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/media/jg/F/20170514/main.py", line 45, in main
model.initialize_network(FLAGS.logs_dir)
File "/media/jg/F/20170514/models/GAN_models.py", line 225, in initialize_network
self.sess.run(tf.global_variables_initializer())
File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype bool
[[Node: Placeholder = Placeholder[dtype=DT_BOOL, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Caused by op 'Placeholder', defined at:
File "/media/jg/F/20170514/main.py", line 54, in <module>
tf.app.run()
File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/media/jg/F/20170514/main.py", line 43, in main
FLAGS.optimizer_param)
File "/media/jg/F/20170514/models/GAN_models.py", line 173, in create_network
self._setup_placeholder()
File "/media/jg/F/20170514/models/GAN_models.py", line 149, in _setup_placeholder
self.train_phase = tf.placeholder(tf.bool)
File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1587, in placeholder
name=name)
File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2043, in _placeholder
name=name)
File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder' with dtype bool
[[Node: Placeholder = Placeholder[dtype=DT_BOOL, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Process finished with exit code 1
Check this out. It seems to have fixed the problems for me. https://github.com/AshishBora/WassersteinGAN.tensorflow/commit/1c6cfa1c20959e9dcca01f0a96f7ca8c54403d1a
UPDATE: After training for 8+ hours with this change, the GAN seems to not learn anything and the loss ranges (for d_loss and g_loss) are way off.
UPDATE 2: I trained with this commit and TF v1.1.0. It seems to have learned to produce faces.
@AshishBora Hi, may you report the number you get for generator and discriminator loss? I am doing WGAN for MNIST images and I see g loss is ~200 and d loss is ~0.003 in the first hour.
@kunrenzhilu could you give a concrete solution to the problem? I can hardly solve it either.
@AshishBora your commits are giving a 404 for me, could you show how did you fixed it?
@ayrtondenner I changed line 115 here to something like:
h_bn = tf.contrib.layers.batch_norm(inputs=h_conv, decay=0.9, epsilon=1e-5, is_training=train_phase, scope="disc_bn%d" % index)
I should change line 326 too, right? They are both batch_norm inside a discriminator network.
Yup, that seems right.
I'm already running it, seems like now it's going to work. Anyway, do you know why your commits are 404'd by now?
Great. Oh, 404 is because I deleted my fork some time ago since I wasn't using it anymore.
I see. I had to re-run it since there were still some minor changes because of TensorFlow 1.0 and compatibility issues. Anyway, do you still have these commits? It would be nice to see if you did any other code changes.
I have a local copy of the whole repo. I have uploaded a zip here.
I had the network training during 10 hours, 11k epochs, and that's the result I got. It still not a human face, but I wanted to know if the training is going ok or not, because as you said above, you can run the network but it doesn't mean its necessarily working. Also, I changed both utils.batch_norm
calls in the discriminator network, but just realized that there are also calls in the generator network, maybe I can replace them to see if it will work better.
On tensorflow 1.12.0, I had the same problem and fixed it by adding the line:
with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE):
before ema.apply