WassersteinGAN.tensorflow icon indicating copy to clipboard operation
WassersteinGAN.tensorflow copied to clipboard

tf.get_variable() error, variable does not exist or was not created

Open SimpleXP opened this issue 7 years ago • 19 comments

My tensorflow version is 0.12.1

when I run run_main.py, I got this error

"ValueError: Variable discriminator/disc_bn1/discriminator_1/disc_bn1/cond/discriminator_1/disc_bn1/moments/moments_1/mean/ExponentialMovingAverage/biased does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?"

Any one has any idea?

SimpleXP avatar Feb 17 '17 02:02 SimpleXP

Maybe you could add: with tf.variable_scope(tf.get_variable_scope(), reuse=False): before ema.apply

https://github.com/carpedm20/DCGAN-tensorflow/issues/59

davidz-zzz avatar Feb 17 '17 17:02 davidz-zzz

This worked for me! (tensorflow 1.0 alpha

loliverhennigh avatar Feb 25 '17 00:02 loliverhennigh

This do not worked for me! (tensorflow 1.0 nightly)

Traceback (most recent call last):

File "main.py", line 55, in <module>
  tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
  _sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "main.py", line 44, in main
  FLAGS.optimizer_param)
File "/home/long/MyCode2/WassersteinGAN.tensorflow-master/models/GAN_models.py", line 197, in create_network
  scope_reuse=True)
File "/home/long/MyCode2/WassersteinGAN.tensorflow-master/models/GAN_models.py", line 118, in _discriminator
  h_bn = utils.batch_norm(h_conv, dims[index + 1], train_phase, scope="disc_bn%d" % index)
File "/home/long/MyCode2/WassersteinGAN.tensorflow-master/utils.py", line 145, in batch_norm
  lambda: (ema.average(batch_mean), ema.average(batch_var)))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1741, in cond
  orig_res, res_t = context_t.BuildCondBranch(fn1)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 1642, in BuildCondBranch
  r = fn()
File "/home/long/MyCode2/WassersteinGAN.tensorflow-master/utils.py", line 139, in mean_var_with_update
  ema_apply_op = ema.apply([batch_mean, batch_var])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 375, in apply
  colocate_with_primary=(var.op.type in ["Variable", "VariableV2"]))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 135, in create_zeros_slot
  colocate_with_primary=colocate_with_primary)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 112, in create_slot
  return _create_slot_var(primary, val, "")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 64, in _create_slot_var
  validate_shape=val.get_shape().is_fully_defined())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1033, in get_variable
  use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 932, in get_variable
  use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
  validate_shape=validate_shape, use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
  use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 671, in _get_single_variable
  "VarScope?" % name)
ValueError: Variable discriminator/disc_bn1/discriminator_1/disc_bn1/moments/moments_1/mean/ExponentialMovingAverage/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

chulaihunde avatar Mar 05 '17 05:03 chulaihunde

It spent me two days to figure out the workaround and ends up with failure. It seems the reason is that although the fake_discriminator set the scope_reuse to be True, however, the tf.cond() statement will create a new control_flow every time, such that the get_variable() cannot retrieve the corresponding variables from the real_discriminator and throw a ValueError .../.../discriminator_1/disc_bn1/... blablabla. Bcuz according to my understanding, there shouldn't be a nested scope ../../discriminator_1 and nested ../../../disc_bn1. Tell me if I am wrong. Anyway, I cannot make changes base on the original code. My workaround was to change to tf.contrib.layers.batch_norm(). Done with one statement.

kunrenzhilu avatar May 01 '17 09:05 kunrenzhilu

@kunrenzhilu : could you be more specific about how you modify tf.contrib.layers.batch_norm() ? I am struggling with the same problem stated above.

lengoanhcat avatar May 10 '17 21:05 lengoanhcat

I have the same problem. After adding: with tf.variable_scope(tf.get_variable_scope(), reuse=False): before ema.apply There comes another problem at model.initialize_network(FLAGS.logs_dir):

Traceback (most recent call last):
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1021, in _do_call
    return fn(*args)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1003, in _run_fn
    status, run_metadata)
  File "/home/jg/miniconda3/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype bool
	 [[Node: Placeholder = Placeholder[dtype=DT_BOOL, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/media/jg/F/20170514/main.py", line 54, in <module>
    tf.app.run()
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "/media/jg/F/20170514/main.py", line 45, in main
    model.initialize_network(FLAGS.logs_dir)
  File "/media/jg/F/20170514/models/GAN_models.py", line 225, in initialize_network
    self.sess.run(tf.global_variables_initializer())
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype bool
	 [[Node: Placeholder = Placeholder[dtype=DT_BOOL, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

Caused by op 'Placeholder', defined at:
  File "/media/jg/F/20170514/main.py", line 54, in <module>
    tf.app.run()
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "/media/jg/F/20170514/main.py", line 43, in main
    FLAGS.optimizer_param)
  File "/media/jg/F/20170514/models/GAN_models.py", line 173, in create_network
    self._setup_placeholder()
  File "/media/jg/F/20170514/models/GAN_models.py", line 149, in _setup_placeholder
    self.train_phase = tf.placeholder(tf.bool)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1587, in placeholder
    name=name)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2043, in _placeholder
    name=name)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/jg/miniconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'Placeholder' with dtype bool
	 [[Node: Placeholder = Placeholder[dtype=DT_BOOL, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]


Process finished with exit code 1

bottlecapper avatar May 14 '17 13:05 bottlecapper

Check this out. It seems to have fixed the problems for me. https://github.com/AshishBora/WassersteinGAN.tensorflow/commit/1c6cfa1c20959e9dcca01f0a96f7ca8c54403d1a

UPDATE: After training for 8+ hours with this change, the GAN seems to not learn anything and the loss ranges (for d_loss and g_loss) are way off.

UPDATE 2: I trained with this commit and TF v1.1.0. It seems to have learned to produce faces.

AshishBora avatar May 14 '17 17:05 AshishBora

@AshishBora Hi, may you report the number you get for generator and discriminator loss? I am doing WGAN for MNIST images and I see g loss is ~200 and d loss is ~0.003 in the first hour.

kinsumliu avatar Jun 13 '17 19:06 kinsumliu

@kunrenzhilu could you give a concrete solution to the problem? I can hardly solve it either.

RyanHangZhou avatar Oct 22 '17 03:10 RyanHangZhou

@AshishBora your commits are giving a 404 for me, could you show how did you fixed it?

ayrtondenner avatar Feb 13 '18 23:02 ayrtondenner

@ayrtondenner I changed line 115 here to something like:

h_bn = tf.contrib.layers.batch_norm(inputs=h_conv, decay=0.9, epsilon=1e-5, is_training=train_phase, scope="disc_bn%d" % index)

AshishBora avatar Feb 14 '18 01:02 AshishBora

I should change line 326 too, right? They are both batch_norm inside a discriminator network.

ayrtondenner avatar Feb 14 '18 01:02 ayrtondenner

Yup, that seems right.

AshishBora avatar Feb 14 '18 01:02 AshishBora

I'm already running it, seems like now it's going to work. Anyway, do you know why your commits are 404'd by now?

ayrtondenner avatar Feb 14 '18 01:02 ayrtondenner

Great. Oh, 404 is because I deleted my fork some time ago since I wasn't using it anymore.

AshishBora avatar Feb 14 '18 01:02 AshishBora

I see. I had to re-run it since there were still some minor changes because of TensorFlow 1.0 and compatibility issues. Anyway, do you still have these commits? It would be nice to see if you did any other code changes.

ayrtondenner avatar Feb 14 '18 01:02 ayrtondenner

I have a local copy of the whole repo. I have uploaded a zip here.

AshishBora avatar Feb 14 '18 02:02 AshishBora

I had the network training during 10 hours, 11k epochs, and that's the result I got. It still not a human face, but I wanted to know if the training is going ok or not, because as you said above, you can run the network but it doesn't mean its necessarily working. Also, I changed both utils.batch_norm calls in the discriminator network, but just realized that there are also calls in the generator network, maybe I can replace them to see if it will work better.

Loss functions

Network images

ayrtondenner avatar Feb 14 '18 12:02 ayrtondenner

On tensorflow 1.12.0, I had the same problem and fixed it by adding the line:

        with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE):

before ema.apply

shimafoolad avatar Nov 25 '18 08:11 shimafoolad