text-to-image icon indicating copy to clipboard operation
text-to-image copied to clipboard

Error from batch_norm

Open akaraspt opened this issue 7 years ago • 19 comments

I got this error when I was trying to run your scripts.

Traceback (most recent call last):
  File "train.py", line 238, in <module>
    main()
  File "train.py", line 76, in main
    input_tensors, variables, loss, outputs, checks = gan.build_model()
  File "/home/akara/Workspace/text-to-image/model.py", line 44, in build_model
    disc_wrong_image, disc_wrong_image_logits   = self.discriminator(t_wrong_image, t_real_caption, reuse = True)
  File "/home/akara/Workspace/text-to-image/model.py", line 165, in discriminator
    h1 = ops.lrelu( self.d_bn1(ops.conv2d(h0, self.options['df_dim']*2, name = 'd_h1_conv'))) #16
  File "/home/akara/Workspace/text-to-image/Utils/ops.py", line 34, in __call__
    ema_apply_op = self.ema.apply([batch_mean, batch_var])
  File "/home/akara/miniconda2/envs/gan/lib/python2.7/site-packages/tensorflow/python/training/moving_averages.py", line 391, in apply
    self._averages[var], var, decay, zero_debias=zero_debias))
  File "/home/akara/miniconda2/envs/gan/lib/python2.7/site-packages/tensorflow/python/training/moving_averages.py", line 70, in assign_moving_average
    update_delta = _zero_debias(variable, value, decay)
  File "/home/akara/miniconda2/envs/gan/lib/python2.7/site-packages/tensorflow/python/training/moving_averages.py", line 177, in _zero_debias
    trainable=False)
  File "/home/akara/miniconda2/envs/gan/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1024, in get_variable
    custom_getter=custom_getter)
  File "/home/akara/miniconda2/envs/gan/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 850, in get_variable
    custom_getter=custom_getter)
  File "/home/akara/miniconda2/envs/gan/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 346, in get_variable
    validate_shape=validate_shape)
  File "/home/akara/miniconda2/envs/gan/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 331, in _true_getter
    caching_device=caching_device, validate_shape=validate_shape)
  File "/home/akara/miniconda2/envs/gan/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 650, in _get_single_variable
    "VarScope?" % name)
ValueError: Variable d_bn1/d_bn1_2/d_bn1_2/moments/moments_1/mean/ExponentialMovingAverage/biased does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

It was when the script is trying to create another discriminator.

disc_real_image, disc_real_image_logits   = self.discriminator(t_real_image, t_real_caption)
disc_wrong_image, disc_wrong_image_logits   = self.discriminator(t_wrong_image, t_real_caption, reuse = True) # Here
disc_fake_image, disc_fake_image_logits   = self.discriminator(fake_image, t_real_caption, reuse = True)

I printed all variables but it seems to initialize with different variable names, but the reuse = True.

akaraspt avatar Feb 02 '17 23:02 akaraspt

same problem

ghost avatar Feb 20 '17 08:02 ghost

Is there any solution for this issue? @csbkwang @akaraspt @paarthneekhara

zhuolinumd avatar Apr 08 '17 02:04 zhuolinumd

What tensorflow version are you using? IIRC the code ran on version r0.10. I don't have access to a machine to debug the code right now.

paarthneekhara avatar Apr 08 '17 03:04 paarthneekhara

I used tensorflow 1.0. Thanks Paarth @paarthneekhara

zhuolinumd avatar Apr 08 '17 21:04 zhuolinumd

@jiang2764 So did it work?

paarthneekhara avatar Apr 09 '17 16:04 paarthneekhara

I got the same error when i want to run the train code. That's why I asked you and others. Thanks. @paarthneekhara

zhuolinumd avatar Apr 09 '17 16:04 zhuolinumd

Hi, This is a compatibility issue with the tf update. Replace the batch_norm class code in ops.py by the one written here https://github.com/iamaaditya/DCGAN-tensorflow/blob/master/ops.py . This should fix the issue.

paarthneekhara avatar Apr 09 '17 17:04 paarthneekhara

I actually add the ops.py to replace the batch_norm.However, it still exists another problem: Variable d_h0_conv/w/Adam/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope? How can I do to solve the problem?Thanks!@paarthneekhara

Duke-Wyh avatar Apr 11 '17 02:04 Duke-Wyh

At last I solved the problem! There were two ways that we need to solve it.First, we should add the ops.py.Second,we should add with tf.variable_scope(tf.get_variable_scope()) to our code. Thanks everyone!

Duke-Wyh avatar Apr 11 '17 03:04 Duke-Wyh

I also got stuck in this problem and solved it in another way. My tensorflow version is '0.12.1'. I replace the batch_norm class code in ops.py. with the code from https://github.com/Hanock/generating_images_part_by_part/blob/master/code/lib/ops.py. I modify the init function(remove the parameter "batch_size") and it finally works.

OwalnutO avatar Apr 19 '17 11:04 OwalnutO

@Duke-Wyh thanks, but where to put tf.variable_scope(tf.get_variable_scope())?

zhhezhhe avatar May 08 '17 07:05 zhhezhhe

@jiang2764 Did you solve this problem? I have the same problem. I used tensorflow 1.0.1.

zhhezhhe avatar May 12 '17 06:05 zhhezhhe

@zhhezhhe Please follow @paarthneekhara 's suggestion, update the ops file, and then modify the argument format for function tf.nn.sigmoid_cross_entropy_with_logits. The training process should work. Thanks @paarthneekhara ! I am running the training process now. I stopped working on this after I asked the question. Now it is time to go for this.

zhuolinumd avatar May 12 '17 15:05 zhuolinumd

@OwalnutO , @jiang2764 if the method worked for you, can you please submit a pull request with the patch for the same?

paarthneekhara avatar May 12 '17 15:05 paarthneekhara

this may help https://github.com/YearnyeenHo/text-to-image .

zhhezhhe avatar Jul 05 '17 15:07 zhhezhhe

where to put tf.variable_scope(tf.get_variable_scope())? @Duke-Wyh

Using https://github.com/YearnyeenHo/text-to-image, I still have this problem in tensorflow1.3. Variable d_h0_conv/w/Adam/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope? How to solve? Thank you@ zhhezhhe

SpadesQ avatar Jan 01 '18 09:01 SpadesQ

Hi @SpadesQ , were you able to find a solution to this? I am facing the same issue.

On replacing ops file, problem with Adam comes while training. If trying to use checkpoint, otFoundError (see above for traceback): Tensor name "d_bn1/moving_mean" not found in checkpoint files Data/Models/latest_model_flowers_temp.ckpt

314rated avatar Apr 03 '18 12:04 314rated

@paarthneekhara

When i try to generate images using the pre trained model, Even i get the following error.

NotFoundError (see above for traceback): Tensor name "d_bn1/moving_mean" not found in checkpoint files Data/Models/latest_model_flowers_temp.ckpt

I am using the code from here https://github.com/YearnyeenHo/text-to-image and have the downloaded the checkpoint file from the link given.

Please suggest a solution.

ravindra82 avatar Apr 19 '18 07:04 ravindra82

@paarthneekhara Thanks for writing this code. I have the same problem as above. I'm running the latest release of each lib needed, but this one stumped me. Is there a good solution that makes this work? All the dialog above is a bit hodgepodge. I'd like to see your solution please.

TheScott463 avatar Aug 05 '20 01:08 TheScott463