DCGAN-tensorflow icon indicating copy to clipboard operation
DCGAN-tensorflow copied to clipboard

TEST Fail || HELP

Open Shnoogy opened this issue 5 years ago • 23 comments

after training, i get this message when i try to test

(tf_gpu) S:\Pagmer\DCGAN\DCGAN-tensorflow-master\DCGAN-tensorflow-master>python main.py --dataset Doors --input_height=250 --crop {'G_img_sum': <absl.flags._flag.BooleanFlag object at 0x000001A603246F98>, 'batch_size': <absl.flags._flag.Flag object at 0x000001A601E230F0>, 'beta1': <absl.flags._flag.Flag object at 0x000001A601EA23C8>, 'checkpoint_dir': <absl.flags._flag.Flag object at 0x000001A603246978>, 'ckpt_freq': <absl.flags._flag.Flag object at 0x000001A603246E10>, 'crop': <absl.flags._flag.BooleanFlag object at 0x000001A603246AC8>, 'data_dir': <absl.flags._flag.Flag object at 0x000001A6032467B8>, 'dataset': <absl.flags._flag.Flag object at 0x000001A6032466A0>, 'epoch': <absl.flags._flag.Flag object at 0x000001A67BFD8320>, 'export': <absl.flags._flag.BooleanFlag object at 0x000001A603246BA8>, 'freeze': <absl.flags._flag.BooleanFlag object at 0x000001A603246C18>, 'h': <tensorflow.python.platform.app._HelpFlag object at 0x000001A60324D048>, 'help': <tensorflow.python.platform.app._HelpFlag object at 0x000001A60324D048>, 'helpfull': <tensorflow.python.platform.app._HelpfullFlag object at 0x000001A60324D0B8>, 'helpshort': <tensorflow.python.platform.app._HelpshortFlag object at 0x000001A60324D128>, 'input_fname_pattern': <absl.flags._flag.Flag object at 0x000001A603246710>, 'input_height': <absl.flags._flag.Flag object at 0x000001A6021F9208>, 'input_width': <absl.flags._flag.Flag object at 0x000001A603246518>, 'learning_rate': <absl.flags._flag.Flag object at 0x000001A600727898>, 'max_to_keep': <absl.flags._flag.Flag object at 0x000001A603246CC0>, 'out_dir': <absl.flags._flag.Flag object at 0x000001A603246828>, 'out_name': <absl.flags._flag.Flag object at 0x000001A6032468D0>, 'output_height': <absl.flags._flag.Flag object at 0x000001A603246588>, 'output_width': <absl.flags._flag.Flag object at 0x000001A603246630>, 'sample_dir': <absl.flags._flag.Flag object at 0x000001A6032469E8>, 'sample_freq': <absl.flags._flag.Flag object at 0x000001A603246D68>, 'train': <absl.flags._flag.BooleanFlag object at 0x000001A603246A20>, 'train_size': <absl.flags._flag.Flag object at 0x000001A601EA2B00>, 'visualize': <absl.flags._flag.BooleanFlag object at 0x000001A603246B38>, 'z_dim': <absl.flags._flag.Flag object at 0x000001A603246EB8>, 'z_dist': <absl.flags._flag.Flag object at 0x000001A603246F60>} 2019-05-15 18:50:27.450608: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2 2019-05-15 18:50:27.609271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7465 pciBusID: 0000:01:00.0 totalMemory: 8.00GiB freeMemory: 6.64GiB 2019-05-15 18:50:27.614137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-05-15 18:50:28.136045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-05-15 18:50:28.140202: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-05-15 18:50:28.141478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-05-15 18:50:28.142812: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6389 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:From C:\ProgramData\Anaconda3\envs\tf_gpu\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer.

Variables: name (type shape) [size]

generator/g_h0_lin/Matrix:0 (float32_ref 100x524288) [52428800, bytes: 209715200] generator/g_h0_lin/bias:0 (float32_ref 524288) [524288, bytes: 2097152] generator/g_bn0/beta:0 (float32_ref 512) [512, bytes: 2048] generator/g_bn0/gamma:0 (float32_ref 512) [512, bytes: 2048] generator/g_h1/w:0 (float32_ref 5x5x256x512) [3276800, bytes: 13107200] generator/g_h1/biases:0 (float32_ref 256) [256, bytes: 1024] generator/g_bn1/beta:0 (float32_ref 256) [256, bytes: 1024] generator/g_bn1/gamma:0 (float32_ref 256) [256, bytes: 1024] generator/g_h2/w:0 (float32_ref 5x5x128x256) [819200, bytes: 3276800] generator/g_h2/biases:0 (float32_ref 128) [128, bytes: 512] generator/g_bn2/beta:0 (float32_ref 128) [128, bytes: 512] generator/g_bn2/gamma:0 (float32_ref 128) [128, bytes: 512] generator/g_h3/w:0 (float32_ref 5x5x64x128) [204800, bytes: 819200] generator/g_h3/biases:0 (float32_ref 64) [64, bytes: 256] generator/g_bn3/beta:0 (float32_ref 64) [64, bytes: 256] generator/g_bn3/gamma:0 (float32_ref 64) [64, bytes: 256] generator/g_h4/w:0 (float32_ref 5x5x3x64) [4800, bytes: 19200] generator/g_h4/biases:0 (float32_ref 3) [3, bytes: 12] discriminator/d_h0_conv/w:0 (float32_ref 5x5x3x64) [4800, bytes: 19200] discriminator/d_h0_conv/biases:0 (float32_ref 64) [64, bytes: 256] discriminator/d_h1_conv/w:0 (float32_ref 5x5x64x128) [204800, bytes: 819200] discriminator/d_h1_conv/biases:0 (float32_ref 128) [128, bytes: 512] discriminator/d_bn1/beta:0 (float32_ref 128) [128, bytes: 512] discriminator/d_bn1/gamma:0 (float32_ref 128) [128, bytes: 512] discriminator/d_h2_conv/w:0 (float32_ref 5x5x128x256) [819200, bytes: 3276800] discriminator/d_h2_conv/biases:0 (float32_ref 256) [256, bytes: 1024] discriminator/d_bn2/beta:0 (float32_ref 256) [256, bytes: 1024] discriminator/d_bn2/gamma:0 (float32_ref 256) [256, bytes: 1024] discriminator/d_h3_conv/w:0 (float32_ref 5x5x256x512) [3276800, bytes: 13107200] discriminator/d_h3_conv/biases:0 (float32_ref 512) [512, bytes: 2048] discriminator/d_bn3/beta:0 (float32_ref 512) [512, bytes: 2048] discriminator/d_bn3/gamma:0 (float32_ref 512) [512, bytes: 2048] discriminator/d_h4_lin/Matrix:0 (float32_ref 524288x1) [524288, bytes: 2097152] discriminator/d_h4_lin/bias:0 (float32_ref 1) [1, bytes: 4] Total size of variables: 62093700 Total bytes of variables: 248374800 [] Reading checkpoints... ./out\20190515.185027 - data - Doors\checkpoint [] Failed to find a checkpoint Traceback (most recent call last): File "main.py", line 147, in tf.app.run() File "C:\ProgramData\Anaconda3\envs\tf_gpu\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "main.py", line 124, in main raise Exception("Checkpoint not found in " + FLAGS.checkpoint_dir) Exception: Checkpoint not found in ./out\20190515.185027 - data - Doors\checkpoint

anyone knows whats up?

Shnoogy avatar May 15 '19 16:05 Shnoogy

Hi. I also faced exact same problem.. Can any one suggest on this how to resolve.

Thanks in advance.

ani16 avatar Jun 13 '19 06:06 ani16

By default, the checkpoint is saved every 200 epochs. If you train during less epochs, no checkpoint will be saved and you won't be able to test your generator.

You can control the checkpoint frequency with --ckpt_freq and the number of iterations with --epoch.

Hope it helps!

guillaumefrd avatar Jun 19 '19 13:06 guillaumefrd

Thanks for the answer! is there a possibility to get different test results? I'm getting only the same one

Shnoogy avatar Jun 19 '19 15:06 Shnoogy

@Shnoogy I am facing the same issue. The generated images are always almost the same with minor change like pixel intensity. I have a small dataset of only 40 images, it may be the cause (overfitting?) but I'm not sure. Do you have a small dataset too ? I tried to change the option of visualize() to 0 but it didn't resolve the issue. (#204)

guillaumefrd avatar Jun 20 '19 09:06 guillaumefrd

Hi..

I am still not able to test .. I trained with 300 epochs then too i got : exception: Checkpoint not found in ./out/20190712.103742 - data - face/checkpoint

what to do now .. please help me.

command to test was: python main.py --input_height 96 --input_width 96 --output_height 96 --output_width 96 --dataset face --crop --epoch 300 --input_fname_pattern ".jpg*"

command to train was : python main.py --input_height 96 --input_width 96 --output_height 96 --output_width 96 --dataset face --crop --train --epoch 300 --input_fname_pattern ".jpg*"

On Thu, Jun 20, 2019 at 2:36 PM Guillaume Fradet [email protected] wrote:

@Shnoogy https://github.com/Shnoogy I am facing the same issue. The generated images are always almost the same with minor change like pixel intensity. I have a small dataset of only 40 images, it may be the cause (overfitting?) but I'm not sure. Do you have a small dataset too ? I tried to change the option of visualize() to 0 but it didn't resolve the issue. (#204 https://github.com/carpedm20/DCGAN-tensorflow/issues/204)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/carpedm20/DCGAN-tensorflow/issues/339?email_source=notifications&email_token=AFLNCW2GLZDKWVWKLRNZKPLP3NCBFA5CNFSM4HNE24X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYEZL2I#issuecomment-503944681, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLNCW2APYXJDVX5EF2GY6LP3NCBFANCNFSM4HNE24XQ .

--

With Best Wishes and Regards Anishi Gupta

ani16 avatar Jul 12 '19 05:07 ani16

Did anyone get the solution? I tried training with 250 epochs but still getting the same error.

matak07 avatar Jul 22 '19 08:07 matak07

No I tried with 300 epochs still not working.

On Mon, 22 Jul 2019, 13:38 matak07, [email protected] wrote:

Did anyone get the solution? I tried training with 250 epochs but still getting the same error.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/carpedm20/DCGAN-tensorflow/issues/339?email_source=notifications&email_token=AFLNCW7KNUKBY5W5PYP3ZWLQAVTHPA5CNFSM4HNE24X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2PD3RQ#issuecomment-513686982, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLNCWZ5D2A2RE4GV5NZE5DQAVTHPANCNFSM4HNE24XQ .

ani16 avatar Jul 22 '19 10:07 ani16

What I did was "hardcoded" the saved checkpoint file into line 122 of main.py file and set visualization to True, and now I can generate test images 1

Muhammad057 avatar Jul 27 '19 05:07 Muhammad057

Its throwing syntax error, when I am hardcoding it.

This is my location of check point. what should I write inside dcgan.load()

/root/Desktop/DCGAN-tensorflow-master/out/20190712.163251 - data - face - x96.z100.uniform_signed.y96.b6

On Sat, Jul 27, 2019 at 11:24 AM Muhammad057 [email protected] wrote:

What I did was "hardcoded" the saved checkpoint file into line 122 of main.py file and set visualization to True, and now I can generate test images [image: 1] https://user-images.githubusercontent.com/40855134/61990593-c92a8e80-b05c-11e9-8ffd-166322c1c1f3.PNG

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/carpedm20/DCGAN-tensorflow/issues/339?email_source=notifications&email_token=AFLNCW6ZNMNRGDKHTRFLNV3QBPPIDA5CNFSM4HNE24X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD26EL7A#issuecomment-515655164, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLNCWY26AUK76AS4FP6R2DQBPPIDANCNFSM4HNE24XQ .

--

With Best Wishes and Regards Anishi Gupta

ani16 avatar Aug 01 '19 06:08 ani16

I write inside dcgan.load() /root/Desktop/DCGAN-tensorflow-master/out/20190712.163251 - data - face - x96.z100.uniform_signed.y96.b6

Add \checkpoint after - x96.z100.uniform_signed.y96.b6, example /root/Desktop/DCGAN-tensorflow-master/out/20190712.163251 - data - face - x96.z100.uniform_signed.y96.b6/checkpoint

The checkpoint file shall point out to the last checkpoint which is saved after every "x" number of iterations.

Muhammad057 avatar Aug 01 '19 07:08 Muhammad057

Hi

I did this, still fail to test.

The error screenshot is attached in this email.

plus when I am running command to test, other black check points folder are created.

Please help me with this.

On Thu, Aug 1, 2019 at 3:15 AM Muhammad057 [email protected] wrote:

I write inside dcgan.load() /root/Desktop/DCGAN-tensorflow-master/out/20190712.163251 - data - face - x96.z100.uniform_signed.y96.b6

Add \checkpoint after - x96.z100.uniform_signed.y96.b6, example /root/Desktop/DCGAN-tensorflow-master/out/20190712.163251 - data - face - x96.z100.uniform_signed.y96.b6/checkpoint

The checkpoint file shall point out to the last checkpoint which is saved after every "x" number of iterations.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/carpedm20/DCGAN-tensorflow/issues/339?email_source=notifications&email_token=AFLNCW6HCCZXD65FGFG3GCLQCKERVA5CNFSM4HNE24X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3JSGRA#issuecomment-517153604, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLNCW4XEE6MUWWQXPA4BETQCKERVANCNFSM4HNE24XQ .

--

With Best Wishes and Regards Anishi Gupta

ani16 avatar Aug 01 '19 09:08 ani16

Hi

I think error lies in mismatch between current graph and the graph.

File "/root/Desktop/DCGAN-tensorflow-master/model.py", line 547, in load self.saver.restore(self.sess, os.path.join(checkpoint_dir, ckpt_name)) File "/root/anaconda2/envs/venv/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1322, in restore

err, "a mismatch between the current graph and the graph")tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:Assign requires shapes of both tensors to match. lhs shape= [100,8192] rhs shape= [100,18432] [[node save/Assign_38 (defined at /root/Desktop/DCGAN-tensorflow-master/model.py:161) ]]Errors may have originated from an input operation.Input Source operations connected to node save/Assign_38: generator/g_h0_lin/Matrix (defined at /root/Desktop/DCGAN-tensorflow-master/ops.py:99) Is there any problem with input size or something?

On Thu, Aug 1, 2019 at 3:30 PM anishi gupta [email protected] wrote:

Hi

I did this, still fail to test.

The error screenshot is attached in this email.

plus when I am running command to test, other black check points folder are created.

Please help me with this.

On Thu, Aug 1, 2019 at 3:15 AM Muhammad057 [email protected] wrote:

I write inside dcgan.load() /root/Desktop/DCGAN-tensorflow-master/out/20190712.163251 - data - face - x96.z100.uniform_signed.y96.b6

Add \checkpoint after - x96.z100.uniform_signed.y96.b6, example /root/Desktop/DCGAN-tensorflow-master/out/20190712.163251 - data - face - x96.z100.uniform_signed.y96.b6/checkpoint

The checkpoint file shall point out to the last checkpoint which is saved after every "x" number of iterations.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/carpedm20/DCGAN-tensorflow/issues/339?email_source=notifications&email_token=AFLNCW6HCCZXD65FGFG3GCLQCKERVA5CNFSM4HNE24X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3JSGRA#issuecomment-517153604, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLNCW4XEE6MUWWQXPA4BETQCKERVANCNFSM4HNE24XQ .

--

With Best Wishes and Regards Anishi Gupta

--

With Best Wishes and Regards Anishi Gupta

ani16 avatar Aug 01 '19 10:08 ani16

send me the screenshot

Muhammad057 avatar Aug 01 '19 10:08 Muhammad057

error

Hi attached the screenshot pluse send over gmail too

ani16 avatar Aug 01 '19 10:08 ani16

Hi

Please find attached. Screenshot.

On Thu, Aug 1, 2019 at 4:00 PM Muhammad057 [email protected] wrote:

send me the screenshot

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/carpedm20/DCGAN-tensorflow/issues/339?email_source=notifications&email_token=AFLNCW3HKDU5TOVIFLKU2T3QCK3MLA5CNFSM4HNE24X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3KEAHI#issuecomment-517226525, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLNCW2FZ5VO4B5N3L3TBTTQCK3MLANCNFSM4HNE24XQ .

--

With Best Wishes and Regards Anishi Gupta

ani16 avatar Aug 01 '19 10:08 ani16

Moreover I am using tensorflow version 1.14.0.. i dont think it will be a problem.

ani16 avatar Aug 01 '19 10:08 ani16

please check the size of train and test images

Muhammad057 avatar Aug 01 '19 11:08 Muhammad057

Its near about 100 images for training. I have used no folder for test iamges

ani16 avatar Aug 01 '19 11:08 ani16

Minimum how many images should i keep for training, epochs I kept = 300. Is there any need of test images folder. I thing it will be generated automatically.

ani16 avatar Aug 01 '19 11:08 ani16

Dimension of images are already given in command terminal.

I dont know where damn problem is.

Please help me.

ani16 avatar Aug 01 '19 11:08 ani16

Hi I am using 100 images for training with 300 epochs. Still fails to test

On Thu, 1 Aug 2019, 16:54 Muhammad057, [email protected] wrote:

please check the size of train and test images

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/carpedm20/DCGAN-tensorflow/issues/339?email_source=notifications&email_token=AFLNCW6WN7TCJT5J6U6RBQLQCLBWHA5CNFSM4HNE24X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3KH3XA#issuecomment-517242332, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLNCW7ESW2LHI33OODOXKDQCLBWHANCNFSM4HNE24XQ .

ani16 avatar Aug 01 '19 14:08 ani16

I found that the problems happened in main.py. Specifically with these lines:

FLAGS.out_dir = os.path.join(FLAGS.out_dir, FLAGS.out_name) FLAGS.checkpoint_dir = os.path.join(FLAGS.out_dir, FLAGS.checkpoint_dir) FLAGS.sample_dir = os.path.join(FLAGS.out_dir, FLAGS.sample_dir)

Notice how the first os.path.join concatenates a directory and a folder name, while the latter two concatenate two directories. On MacOS, I was getting two directories jammed together into one here, resulting in a long directory that didn't exist. The code was then just generating a new out directory with a new timestamp, and not finding a checkpoint inside (and why would it?).

I commented out these lines and added my own string manipulation to get it working, but it's messy. I recommend adding print(FLAGS.checkpoint_dir) and print(FLAGS.sample_dir) before the checkpoint check in the code. See if what you're getting is actually the directory you want, and work backwards. You may just want to set your own values immediately before the if/else loading the checkpoint.

trespassermax avatar Oct 19 '19 02:10 trespassermax

Hi I am using 100 images for training with 300 epochs. Still fails to test On Thu, 1 Aug 2019, 16:54 Muhammad057, @.***> wrote: please check the size of train and test images — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#339?email_source=notifications&email_token=AFLNCW6WN7TCJT5J6U6RBQLQCLBWHA5CNFSM4HNE24X2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3KH3XA#issuecomment-517242332>, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLNCW7ESW2LHI33OODOXKDQCLBWHANCNFSM4HNE24XQ .

I met similar problems as you. Have you got the solution? Thanks

JJisbug avatar Nov 11 '19 08:11 JJisbug