DCGAN-tensorflow
DCGAN-tensorflow copied to clipboard
TEST Fail || HELP
after training, i get this message when i try to test
(tf_gpu) S:\Pagmer\DCGAN\DCGAN-tensorflow-master\DCGAN-tensorflow-master>python main.py --dataset Doors --input_height=250 --crop {'G_img_sum': <absl.flags._flag.BooleanFlag object at 0x000001A603246F98>, 'batch_size': <absl.flags._flag.Flag object at 0x000001A601E230F0>, 'beta1': <absl.flags._flag.Flag object at 0x000001A601EA23C8>, 'checkpoint_dir': <absl.flags._flag.Flag object at 0x000001A603246978>, 'ckpt_freq': <absl.flags._flag.Flag object at 0x000001A603246E10>, 'crop': <absl.flags._flag.BooleanFlag object at 0x000001A603246AC8>, 'data_dir': <absl.flags._flag.Flag object at 0x000001A6032467B8>, 'dataset': <absl.flags._flag.Flag object at 0x000001A6032466A0>, 'epoch': <absl.flags._flag.Flag object at 0x000001A67BFD8320>, 'export': <absl.flags._flag.BooleanFlag object at 0x000001A603246BA8>, 'freeze': <absl.flags._flag.BooleanFlag object at 0x000001A603246C18>, 'h': <tensorflow.python.platform.app._HelpFlag object at 0x000001A60324D048>, 'help': <tensorflow.python.platform.app._HelpFlag object at 0x000001A60324D048>, 'helpfull': <tensorflow.python.platform.app._HelpfullFlag object at 0x000001A60324D0B8>, 'helpshort': <tensorflow.python.platform.app._HelpshortFlag object at 0x000001A60324D128>, 'input_fname_pattern': <absl.flags._flag.Flag object at 0x000001A603246710>, 'input_height': <absl.flags._flag.Flag object at 0x000001A6021F9208>, 'input_width': <absl.flags._flag.Flag object at 0x000001A603246518>, 'learning_rate': <absl.flags._flag.Flag object at 0x000001A600727898>, 'max_to_keep': <absl.flags._flag.Flag object at 0x000001A603246CC0>, 'out_dir': <absl.flags._flag.Flag object at 0x000001A603246828>, 'out_name': <absl.flags._flag.Flag object at 0x000001A6032468D0>, 'output_height': <absl.flags._flag.Flag object at 0x000001A603246588>, 'output_width': <absl.flags._flag.Flag object at 0x000001A603246630>, 'sample_dir': <absl.flags._flag.Flag object at 0x000001A6032469E8>, 'sample_freq': <absl.flags._flag.Flag object at 0x000001A603246D68>, 'train': <absl.flags._flag.BooleanFlag object at 0x000001A603246A20>, 'train_size': <absl.flags._flag.Flag object at 0x000001A601EA2B00>, 'visualize': <absl.flags._flag.BooleanFlag object at 0x000001A603246B38>, 'z_dim': <absl.flags._flag.Flag object at 0x000001A603246EB8>, 'z_dist': <absl.flags._flag.Flag object at 0x000001A603246F60>} 2019-05-15 18:50:27.450608: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 2019-05-15 18:50:27.609271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7465 pciBusID: 0000:01:00.0 totalMemory: 8.00GiB freeMemory: 6.64GiB 2019-05-15 18:50:27.614137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-05-15 18:50:28.136045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-05-15 18:50:28.140202: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-05-15 18:50:28.141478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-05-15 18:50:28.142812: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6389 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:From C:\ProgramData\Anaconda3\envs\tf_gpu\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer.
Variables: name (type shape) [size]
generator/g_h0_lin/Matrix:0 (float32_ref 100x524288) [52428800, bytes: 209715200]
generator/g_h0_lin/bias:0 (float32_ref 524288) [524288, bytes: 2097152]
generator/g_bn0/beta:0 (float32_ref 512) [512, bytes: 2048]
generator/g_bn0/gamma:0 (float32_ref 512) [512, bytes: 2048]
generator/g_h1/w:0 (float32_ref 5x5x256x512) [3276800, bytes: 13107200]
generator/g_h1/biases:0 (float32_ref 256) [256, bytes: 1024]
generator/g_bn1/beta:0 (float32_ref 256) [256, bytes: 1024]
generator/g_bn1/gamma:0 (float32_ref 256) [256, bytes: 1024]
generator/g_h2/w:0 (float32_ref 5x5x128x256) [819200, bytes: 3276800]
generator/g_h2/biases:0 (float32_ref 128) [128, bytes: 512]
generator/g_bn2/beta:0 (float32_ref 128) [128, bytes: 512]
generator/g_bn2/gamma:0 (float32_ref 128) [128, bytes: 512]
generator/g_h3/w:0 (float32_ref 5x5x64x128) [204800, bytes: 819200]
generator/g_h3/biases:0 (float32_ref 64) [64, bytes: 256]
generator/g_bn3/beta:0 (float32_ref 64) [64, bytes: 256]
generator/g_bn3/gamma:0 (float32_ref 64) [64, bytes: 256]
generator/g_h4/w:0 (float32_ref 5x5x3x64) [4800, bytes: 19200]
generator/g_h4/biases:0 (float32_ref 3) [3, bytes: 12]
discriminator/d_h0_conv/w:0 (float32_ref 5x5x3x64) [4800, bytes: 19200]
discriminator/d_h0_conv/biases:0 (float32_ref 64) [64, bytes: 256]
discriminator/d_h1_conv/w:0 (float32_ref 5x5x64x128) [204800, bytes: 819200]
discriminator/d_h1_conv/biases:0 (float32_ref 128) [128, bytes: 512]
discriminator/d_bn1/beta:0 (float32_ref 128) [128, bytes: 512]
discriminator/d_bn1/gamma:0 (float32_ref 128) [128, bytes: 512]
discriminator/d_h2_conv/w:0 (float32_ref 5x5x128x256) [819200, bytes: 3276800]
discriminator/d_h2_conv/biases:0 (float32_ref 256) [256, bytes: 1024]
discriminator/d_bn2/beta:0 (float32_ref 256) [256, bytes: 1024]
discriminator/d_bn2/gamma:0 (float32_ref 256) [256, bytes: 1024]
discriminator/d_h3_conv/w:0 (float32_ref 5x5x256x512) [3276800, bytes: 13107200]
discriminator/d_h3_conv/biases:0 (float32_ref 512) [512, bytes: 2048]
discriminator/d_bn3/beta:0 (float32_ref 512) [512, bytes: 2048]
discriminator/d_bn3/gamma:0 (float32_ref 512) [512, bytes: 2048]
discriminator/d_h4_lin/Matrix:0 (float32_ref 524288x1) [524288, bytes: 2097152]
discriminator/d_h4_lin/bias:0 (float32_ref 1) [1, bytes: 4]
Total size of variables: 62093700
Total bytes of variables: 248374800
[] Reading checkpoints... ./out\20190515.185027 - data - Doors\checkpoint
[] Failed to find a checkpoint
Traceback (most recent call last):
File "main.py", line 147, in
anyone knows whats up?
Hi. I also faced exact same problem.. Can any one suggest on this how to resolve.
Thanks in advance.
By default, the checkpoint is saved every 200 epochs. If you train during less epochs, no checkpoint will be saved and you won't be able to test your generator.
You can control the checkpoint frequency with --ckpt_freq
and the number of iterations with --epoch
.
Hope it helps!
Thanks for the answer! is there a possibility to get different test results? I'm getting only the same one
@Shnoogy I am facing the same issue. The generated images are always almost the same with minor change like pixel intensity. I have a small dataset of only 40 images, it may be the cause (overfitting?) but I'm not sure. Do you have a small dataset too ? I tried to change the option of visualize()
to 0 but it didn't resolve the issue. (#204)
Hi..
I am still not able to test .. I trained with 300 epochs then too i got : exception: Checkpoint not found in ./out/20190712.103742 - data - face/checkpoint
what to do now .. please help me.
command to test was: python main.py --input_height 96 --input_width 96 --output_height 96 --output_width 96 --dataset face --crop --epoch 300 --input_fname_pattern ".jpg*"
command to train was : python main.py --input_height 96 --input_width 96 --output_height 96 --output_width 96 --dataset face --crop --train --epoch 300 --input_fname_pattern ".jpg*"
On Thu, Jun 20, 2019 at 2:36 PM Guillaume Fradet [email protected] wrote:
@Shnoogy https://github.com/Shnoogy I am facing the same issue. The generated images are always almost the same with minor change like pixel intensity. I have a small dataset of only 40 images, it may be the cause (overfitting?) but I'm not sure. Do you have a small dataset too ? I tried to change the option of visualize() to 0 but it didn't resolve the issue. (#204 https://github.com/carpedm20/DCGAN-tensorflow/issues/204)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/carpedm20/DCGAN-tensorflow/issues/339?email_source=notifications&email_token=AFLNCW2GLZDKWVWKLRNZKPLP3NCBFA5CNFSM4HNE24X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYEZL2I#issuecomment-503944681, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLNCW2APYXJDVX5EF2GY6LP3NCBFANCNFSM4HNE24XQ .
--
With Best Wishes and Regards Anishi Gupta
Did anyone get the solution? I tried training with 250 epochs but still getting the same error.
No I tried with 300 epochs still not working.
On Mon, 22 Jul 2019, 13:38 matak07, [email protected] wrote:
Did anyone get the solution? I tried training with 250 epochs but still getting the same error.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/carpedm20/DCGAN-tensorflow/issues/339?email_source=notifications&email_token=AFLNCW7KNUKBY5W5PYP3ZWLQAVTHPA5CNFSM4HNE24X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2PD3RQ#issuecomment-513686982, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLNCWZ5D2A2RE4GV5NZE5DQAVTHPANCNFSM4HNE24XQ .
What I did was "hardcoded" the saved checkpoint file into line 122 of main.py file and set visualization to True, and now I can generate test images
Its throwing syntax error, when I am hardcoding it.
This is my location of check point. what should I write inside dcgan.load()
/root/Desktop/DCGAN-tensorflow-master/out/20190712.163251 - data - face - x96.z100.uniform_signed.y96.b6
On Sat, Jul 27, 2019 at 11:24 AM Muhammad057 [email protected] wrote:
What I did was "hardcoded" the saved checkpoint file into line 122 of main.py file and set visualization to True, and now I can generate test images [image: 1] https://user-images.githubusercontent.com/40855134/61990593-c92a8e80-b05c-11e9-8ffd-166322c1c1f3.PNG
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/carpedm20/DCGAN-tensorflow/issues/339?email_source=notifications&email_token=AFLNCW6ZNMNRGDKHTRFLNV3QBPPIDA5CNFSM4HNE24X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD26EL7A#issuecomment-515655164, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLNCWY26AUK76AS4FP6R2DQBPPIDANCNFSM4HNE24XQ .
--
With Best Wishes and Regards Anishi Gupta
I write inside dcgan.load() /root/Desktop/DCGAN-tensorflow-master/out/20190712.163251 - data - face - x96.z100.uniform_signed.y96.b6
Add \checkpoint after - x96.z100.uniform_signed.y96.b6, example /root/Desktop/DCGAN-tensorflow-master/out/20190712.163251 - data - face - x96.z100.uniform_signed.y96.b6/checkpoint
The checkpoint file shall point out to the last checkpoint which is saved after every "x" number of iterations.
Hi
I did this, still fail to test.
The error screenshot is attached in this email.
plus when I am running command to test, other black check points folder are created.
Please help me with this.
On Thu, Aug 1, 2019 at 3:15 AM Muhammad057 [email protected] wrote:
I write inside dcgan.load() /root/Desktop/DCGAN-tensorflow-master/out/20190712.163251 - data - face - x96.z100.uniform_signed.y96.b6
Add \checkpoint after - x96.z100.uniform_signed.y96.b6, example /root/Desktop/DCGAN-tensorflow-master/out/20190712.163251 - data - face - x96.z100.uniform_signed.y96.b6/checkpoint
The checkpoint file shall point out to the last checkpoint which is saved after every "x" number of iterations.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/carpedm20/DCGAN-tensorflow/issues/339?email_source=notifications&email_token=AFLNCW6HCCZXD65FGFG3GCLQCKERVA5CNFSM4HNE24X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3JSGRA#issuecomment-517153604, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLNCW4XEE6MUWWQXPA4BETQCKERVANCNFSM4HNE24XQ .
--
With Best Wishes and Regards Anishi Gupta
Hi
I think error lies in mismatch between current graph and the graph.
File "/root/Desktop/DCGAN-tensorflow-master/model.py", line 547, in load self.saver.restore(self.sess, os.path.join(checkpoint_dir, ckpt_name)) File "/root/anaconda2/envs/venv/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1322, in restore
err, "a mismatch between the current graph and the graph")tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:Assign requires shapes of both tensors to match. lhs shape= [100,8192] rhs shape= [100,18432] [[node save/Assign_38 (defined at /root/Desktop/DCGAN-tensorflow-master/model.py:161) ]]Errors may have originated from an input operation.Input Source operations connected to node save/Assign_38: generator/g_h0_lin/Matrix (defined at /root/Desktop/DCGAN-tensorflow-master/ops.py:99) Is there any problem with input size or something?
On Thu, Aug 1, 2019 at 3:30 PM anishi gupta [email protected] wrote:
Hi
I did this, still fail to test.
The error screenshot is attached in this email.
plus when I am running command to test, other black check points folder are created.
Please help me with this.
On Thu, Aug 1, 2019 at 3:15 AM Muhammad057 [email protected] wrote:
I write inside dcgan.load() /root/Desktop/DCGAN-tensorflow-master/out/20190712.163251 - data - face - x96.z100.uniform_signed.y96.b6
Add \checkpoint after - x96.z100.uniform_signed.y96.b6, example /root/Desktop/DCGAN-tensorflow-master/out/20190712.163251 - data - face - x96.z100.uniform_signed.y96.b6/checkpoint
The checkpoint file shall point out to the last checkpoint which is saved after every "x" number of iterations.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/carpedm20/DCGAN-tensorflow/issues/339?email_source=notifications&email_token=AFLNCW6HCCZXD65FGFG3GCLQCKERVA5CNFSM4HNE24X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3JSGRA#issuecomment-517153604, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLNCW4XEE6MUWWQXPA4BETQCKERVANCNFSM4HNE24XQ .
--
With Best Wishes and Regards Anishi Gupta
--
With Best Wishes and Regards Anishi Gupta
send me the screenshot
Hi attached the screenshot pluse send over gmail too
Hi
Please find attached. Screenshot.
On Thu, Aug 1, 2019 at 4:00 PM Muhammad057 [email protected] wrote:
send me the screenshot
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/carpedm20/DCGAN-tensorflow/issues/339?email_source=notifications&email_token=AFLNCW3HKDU5TOVIFLKU2T3QCK3MLA5CNFSM4HNE24X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3KEAHI#issuecomment-517226525, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLNCW2FZ5VO4B5N3L3TBTTQCK3MLANCNFSM4HNE24XQ .
--
With Best Wishes and Regards Anishi Gupta
Moreover I am using tensorflow version 1.14.0.. i dont think it will be a problem.
please check the size of train and test images
Its near about 100 images for training. I have used no folder for test iamges
Minimum how many images should i keep for training, epochs I kept = 300. Is there any need of test images folder. I thing it will be generated automatically.
Dimension of images are already given in command terminal.
I dont know where damn problem is.
Please help me.
Hi I am using 100 images for training with 300 epochs. Still fails to test
On Thu, 1 Aug 2019, 16:54 Muhammad057, [email protected] wrote:
please check the size of train and test images
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/carpedm20/DCGAN-tensorflow/issues/339?email_source=notifications&email_token=AFLNCW6WN7TCJT5J6U6RBQLQCLBWHA5CNFSM4HNE24X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3KH3XA#issuecomment-517242332, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLNCW7ESW2LHI33OODOXKDQCLBWHANCNFSM4HNE24XQ .
I found that the problems happened in main.py. Specifically with these lines:
FLAGS.out_dir = os.path.join(FLAGS.out_dir, FLAGS.out_name) FLAGS.checkpoint_dir = os.path.join(FLAGS.out_dir, FLAGS.checkpoint_dir) FLAGS.sample_dir = os.path.join(FLAGS.out_dir, FLAGS.sample_dir)
Notice how the first os.path.join concatenates a directory and a folder name, while the latter two concatenate two directories. On MacOS, I was getting two directories jammed together into one here, resulting in a long directory that didn't exist. The code was then just generating a new out directory with a new timestamp, and not finding a checkpoint inside (and why would it?).
I commented out these lines and added my own string manipulation to get it working, but it's messy. I recommend adding print(FLAGS.checkpoint_dir) and print(FLAGS.sample_dir) before the checkpoint check in the code. See if what you're getting is actually the directory you want, and work backwards. You may just want to set your own values immediately before the if/else loading the checkpoint.
Hi I am using 100 images for training with 300 epochs. Still fails to test … On Thu, 1 Aug 2019, 16:54 Muhammad057, @.***> wrote: please check the size of train and test images — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#339?email_source=notifications&email_token=AFLNCW6WN7TCJT5J6U6RBQLQCLBWHA5CNFSM4HNE24X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3KH3XA#issuecomment-517242332>, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLNCW7ESW2LHI33OODOXKDQCLBWHANCNFSM4HNE24XQ .
I met similar problems as you. Have you got the solution? Thanks