ImageDenoisingGAN
ImageDenoisingGAN copied to clipboard
How to train the model?
Hello manumathewthomas, Thank your for your code, I am trying to train the model from scratch, but met this problem, could you show how to solve it?
Traceback (most recent call last):
File "train.py", line 91, in <module>
train()
File "train.py", line 23, in train
Dg = discriminator(Gz, reuse=True)
File "/home/ubuntu/trinh/Edited_ImageDenoisingGAN /model.py", line 32, in discriminator
conv1, conv1_weights = conv_layer(input, 4, 3, 48, 2, "d_conv1", reuse=reuse)
File "/home/ubuntu/trinh/Edited_ImageDenoisingGAN /conv_helper.py", line 10, in conv_layer
output = slim.batch_norm(output)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 643, in batch_norm
outputs = layer.apply(inputs, training=is_training)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 671, in apply
return self.__call__(inputs, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 559, in __call__
self.build(input_shapes[0])
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/normalization.py", line 201, in build
trainable=True)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py", line 458, in add_variable
trainable=trainable and self.trainable)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 1203, in get_variable
constraint=constraint)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 1092, in get_variable
constraint=constraint)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 417, in get_variable
return custom_getter(**custom_getter_kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1539, in layer_variable_getter
return _model_variable_getter(getter, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1531, in _model_variable_getter
custom_getter=getter, use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 262, in model_variable
use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 217, in variable
use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 394, in _true_getter
use_resource=use_resource, constraint=constraint)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py", line 742, in _get_single_variable
name, "".join(traceback.format_list(tb))))
ValueError: Variable d_conv1/BatchNorm/beta already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 217, in variable
use_resource=use_resource)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 181, in func_with_args
return func(*args, **current_args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py", line 262, in model_variable
use_resource=use_resource)
Are you running it on Tensorflow V1.0 ?
I have faced the same problem. Tensorflow 1.8.0.
Can you try with Tensorflow 1.0
@Toni-Chan did you solve the above issue?
It seemed that it could work, however it would always crash after some fifteen cycles of training. It doesn't quite fit my task so I dropped
I faced the same wrong with tensorflow1.10
@TrinhQuocNguyen how you solve the problem? I an new to this.
I faced the same wrong with tensorflow1.10
hi ,I have the same problem with tensorflow1.10 . Have you solved it now?
I faced the same wrong with tensorflow1.10
hi ,I have the same problem with tensorflow1.10 . Have you solved it now?
hi,I add some code in the conv_layer of conv_helper.py:
def conv_layer(input_image, ksize, in_channels, out_channels, stride, scope_name, activation_function=lrelu, reuse=False): with tf.variable_scope(scope_name): if reuse: tf.get_variable_scope().reuse_variables() (Generally this is the case) filter = tf.Variable(tf.random_normal([ksize, ksize, in_channels, out_channels], stddev=0.03)) output = tf.nn.conv2d(input_image, filter, strides=[1, stride, stride, 1], padding='SAME') output = slim.batch_norm(output) if activation_function: output = activation_function(output) return output, filter
but, when I train this code the Loss become very large like -46595465644.
Facing the same issue on tensorflow v1.1.
you can try it that adding reuse=reuse
in function conv_layer @kaushiksk @wish829
with tf.variable_scope(scope_name, reuse=reuse):
you can try it that adding
reuse=reuse
in function conv_layer @kaushiksk @wish829with tf.variable_scope(scope_name, reuse=reuse):
你好,非常感谢回复,我已经解决了这个问题,但碰到了另一个问题,就是运行一段时间出现“GraphDef cannot be larger than 2GB”这个报错,不知道你是否遇到,有什么解决办法吗?
a problem which may be like yours is troubling me,but I have no idea for solving it now. @wish829 ` Step 12000/100000 Gen Loss: 12350353000.0 Disc Loss: 1.4004402 PSNR: 26.14135902868828 SSIM: 0.8665272159942402 Step 12010/100000 Gen Loss: 12292836000.0 Disc Loss: 1.400274 PSNR: 25.94992507552601 SSIM: 0.8655586891626611 Step 12020/100000 Gen Loss: 15851811000.0 Disc Loss: 1.4003873 PSNR: 26.15803991559578 SSIM: 0.8670629243999279 Step 12030/100000 Gen Loss: 17971567000.0 Disc Loss: 1.4054713 PSNR: 25.995083258334443 SSIM: 0.8650341726544077 Step 12040/100000 Gen Loss: 11211838000.0 Disc Loss: 1.4014628 PSNR: 26.222578627884307 SSIM: 0.8671878400058987 Step 12050/100000 Gen Loss: 11266576000.0 Disc Loss: 1.4024365 PSNR: 26.241693138333112 SSIM: 0.8671688884932088 Step 12060/100000 Gen Loss: 17548194000.0 Disc Loss: 1.4003773 PSNR: 26.00901044193707 SSIM: 0.865215676049251 Step 12070/100000 Gen Loss: 23370770000.0 Disc Loss: 1.400408 PSNR: 26.120438055008826 SSIM: 0.8658342743148607 Step 12080/100000 Gen Loss: 10717686000.0 Disc Loss: 1.400349 PSNR: 26.106458721349533 SSIM: 0.8668624241359252 Step 12090/100000 Gen Loss: 11456956000.0 Disc Loss: 1.4003404 PSNR: 26.14799384579858 SSIM: 0.8670219117032115 Step 12100/100000 Gen Loss: 16212880000.0 Disc Loss: 1.4002614 PSNR: 26.113362434651755 SSIM: 0.8664612569021332 Step 12110/100000 Gen Loss: 17638543000.0 Disc Loss: 1.4002542 PSNR: 26.076150132750907 SSIM: 0.8671447135213092 Step 12120/100000 Gen Loss: 11208785000.0 Disc Loss: 1.4002402 PSNR: 26.08032439059292 SSIM: 0.8651132589262946 Step 12130/100000 Gen Loss: 10916414000.0 Disc Loss: 1.4002037 PSNR: 25.984372670853546 SSIM: 0.8638963023554623 Step 12140/100000 Gen Loss: 18496344000.0 Disc Loss: 1.4002202 PSNR: 26.088072032465114 SSIM: 0.8653338786023229 Step 12150/100000 Gen Loss: 16589135000.0 Disc Loss: 1.4002035 PSNR: 25.83919717160348 SSIM: 0.8617282185640999 Step 12160/100000 Gen Loss: 12576207000.0 Disc Loss: 1.4002212 PSNR: 26.20429954486186 SSIM: 0.8678690929255833 Step 12170/100000 Gen Loss: 10253853000.0 Disc Loss: 1.4001697 PSNR: 26.09443495684801 SSIM: 0.8662818556931983 Step 12180/100000 Gen Loss: 15371867000.0 Disc Loss: 1.4002001 PSNR: 26.039226803571175 SSIM: 0.8658360602984164 Step 12190/100000 Gen Loss: 23898941000.0 Disc Loss: 1.4022608 PSNR: 25.946668566224393 SSIM: 0.8640443029233819 Step 12200/100000 Gen Loss: 30478232000.0 Disc Loss: 1.4523464 PSNR: 25.13610710166375 SSIM: 0.8386938824245166 Step 12210/100000 Gen Loss: 28219340000.0 Disc Loss: 1.4221447 PSNR: 25.567860541426697 SSIM: 0.8482851503651547 Step 12220/100000 Gen Loss: 22672560000.0 Disc Loss: 1.4141589 PSNR: 25.98554835956014 SSIM: 0.8506765904464207 Step 12230/100000 Gen Loss: 24188324000.0 Disc Loss: 1.4071617 PSNR: 26.35134289574428 SSIM: 0.8587045622759665 Step 12240/100000 Gen Loss: 13836016000.0 Disc Loss: 1.4070854 PSNR: 26.488961297255237 SSIM: 0.8696061696106584
[libprotobuf FATAL external/protobuf_archive/src/google/protobuf/message_lite.cc:68] CHECK failed: (byte_size_before_serialization) == (byte_size_after_serialization): tensorflow.GraphDef was modified concurrently during serialization. terminate called after throwing an instance of 'google::protobuf::FatalException' what(): CHECK failed: (byte_size_before_serialization) == (byte_size_after_serialization): tensorflow.GraphDef was modified concurrently during serialization. `
@wish829 I solve my problem by deleting the Graphs directory directly which is in root mode,but the training process often collapses.
2019-02-26 16:55:50.330234: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0 2019-02-26 16:55:50.473725: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-02-26 16:55:50.473751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0 2019-02-26 16:55:50.473756: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N 2019-02-26 16:55:50.473909: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7311 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1) Step 10/100000 Gen Loss: 37582434000.0 Disc Loss: 1.5633842 PSNR: 20.424394266955158 SSIM: 0.7810406358624654 Step 20/100000 Gen Loss: 35300233000.0 Disc Loss: 1.5486042 PSNR: 20.080686554357992 SSIM: 0.7819879789024556 Step 30/100000 Gen Loss: nan Disc Loss: nan PSNR: 5.7679735050889995 SSIM: 0.00017517162219731627 Step 40/100000 Gen Loss: nan Disc Loss: nan PSNR: 5.7679735050889995 SSIM: 0.00017517162219731627 Step 50/100000 Gen Loss: nan Disc Loss: nan PSNR: 5.7679735050889995 SSIM: 0.00017517162219731627 Step 60/100000 Gen Loss: nan Disc Loss: nan PSNR: 5.7679735050889995 SSIM: 0.00017517162219731627
you can try it that adding
reuse=reuse
in function conv_layer @kaushiksk @wish829with tf.variable_scope(scope_name, reuse=reuse):
i download vgg16.tfmodel in other place,when i run python train.py ,error happen:\
Traceback (most recent call last): File "/home/usst/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 523, in import_graph_def ret.append(name_to_op[operation_name].outputs[output_index]) KeyError: 'conv2_2/conv2_2'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 91, in
@firdameng why Gen loss and Disc Loss are nan? Thank Step 90/100000 Gen Loss: nan Disc Loss: nan PSNR: 5.7679735050889995 SSIM: 0.00017517162219731627 Step 100/100000 Gen Loss: nan Disc Loss: nan PSNR: 5.7679735050889995 SSIM: 0.00017517162219731627
Hi guys, The CKPT FILE and Dataset are invalid now, could you send me if it is possible. Thank!
Do you have dataset file? Can you send it to me ? 请问你有训练数据集吗?能分享一份给我吗?非常感谢 @TrinhQuocNguyen @manumathewthomas @Toni-Chan @phaniavi @coco1549134149 @stefenmax @qiongshuai @fourteen14fourteen @firdameng @wish829
did you solve this problem please ?
Do you have dataset file? Can you send it to me? (Thank you if you can) 请问你有训练数据集吗?能分享一份给我吗?非常感谢 @TrinhQuocNguyen @manumathewthomas @Toni-Chan @phaniavi @coco1549134149 @stefenmax @qiongshuai @fourteen14fourteen @firdameng @wish829 @Tian14267
I can do my own dataset, but i don't know what are metric images. And other details about prepare dataset to train. How do it collegues?
我可以创建自己的数据集,但我不知道什么是公制图像。有关准备数据集进行训练的其他详细信息。 同事如何?
您创建的数据集的图像大小是多少呢?
you can try it that adding
reuse=reuse
in function conv_layerwith tf.variable_scope(scope_name, reuse=reuse):
@firdameng I have the same problem with reuse... Thank you - I have solved it by adding this.
我可以创建自己的数据集,但我不知道什么是公制图像。有关准备数据集进行训练的其他详细信息。 同事如何?
您创建的数据集的图像大小是多少呢?
@Susan3333
Dataset is reshaped to 256x512
Now I have problems to understand how to prepare dataset. What is the name of images for train and validation I must create?
And about validation - Is it original images?
I have tried 'gauss + 2092 + A', I'm not sure. Can anybody who train that say me structure with for dataset? Where must be grountruth images?
Now I have another error. Why in code I see padding? For what is it?
npad = ((0, 0), (56, 56), (0, 0), (0, 0)) validation = np.pad(validation, pad_width=npad, mode='constant', constant_values=0)
And this?
image = np.resize(image[7][56:, :, :], [144, 256, 3])
ValueError: Cannot feed value of shape (1, 368, 512, 3) for Tensor 'generated_image:0', which has shape '(?, 256, 256, 3)'
thank you for your reply
At 2020-09-17 13:33:59, "Marsel Iamaev" [email protected] wrote:
you can try it that adding reuse=reuse in function conv_layer with tf.variable_scope(scope_name, reuse=reuse):
@firdameng I have the same problem with reuse... Thank you - I have solved it by adding this.
我可以创建自己的数据集,但我不知道什么是公制图像。有关准备数据集进行训练的其他详细信息。 同事如何?
您创建的数据集的图像大小是多少呢?
@Susan3333 Now i have problems to understand how to prepare dataset. What is the name of images for train and validation I must create? And about validation - Is it original images? I have tried 'gauss + 2092 + A', I'm not sure. Can anybody who train that say me structure with for dataset? Where must be grountruth images?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
So I started this. But now I have two problems with several moments.
-
GraphDef cannot be larger than 2GB. (Training will broke after several iterations)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot serialize protocol buffer of type tensorflow.GraphDef as the serialized size (2693993471bytes) would be larger than the limit (2147483647 bytes)
-
Gen Loss: nan (Training will broke after several iterations)