BMSG-GAN
BMSG-GAN copied to clipboard
The use of the loss function.
When I use wgan-gp as a loss function, the training fails. Any explanation?
@huangzh13, please check out the latest commit 4522d6772dd1d56b9eb073e4ea23c51562064812 which fixes the memory leak in the "wgan-gp" loss calculation. Also, in order for WGAN-GP to work, you need to tune the learning rates of the Discriminator and Generator differently. Try the values of:
g_lr = 0.003
and d_lr = 0.0003
.
Hope this helps.
Please let me know how it works out. :+1:
Best regards, @akanimax
Thanks for your reply!
I have checked the latest code and tuned the learning rates. I have also tried the values of g_lr=0.0003
and d_lr=0.0003
. But they didn't work.
The networks tends to generate the color blocks.
@huangzh13, Yes, this is the desired behaviour. You get exactly these kinds of colour blocks on the high resolution, but could you please check the output of lower resolutions? MSG-GAN indeed has this advantage over other GANs that you can check the training of lower resolutions as well. So you get highly informative feedback throughout training. Also, just have some patience, because these colour blocks very soon will convert into your required samples.
For your reference:
This is output of my network at highest resolution (256_x_256) at 15th epoch (data is very less 5.5K only):
And at the same time, at 8_x_8 the output is:
Basically, the point is that the training progresses from bottoms up. And then synchronizes everywhere. Hope this helps.
Best regards, @akanimax
Am I too impatient?
I opened this issue because after the same number of epochs, the model with RelativisticAverageHinge Loss
had got quite good images (both high resolution and low resolution).
Does this mean that for BMSG-GAN, RelativisticAverageHinge Loss
is a more suitable loss?
@huangzh13, As I could understand, you wrote that you were not getting results at all. If you had mentioned that you have obtained good results using RaHinge
, I wouldn't have written so. Also, I didn't say that you are impatient :smile:.
About this, I have mentioned in the paper, that With RaHinge or with other Relativistic versions of losses, you don't have to tune the learning rate so much, and that's why we used RaHinge loss. Could you please share your high resolution results with RaHinge? It would be helpful for others as well.
Best regards, @akanimax
I train the model on CelebA dataset
for only 3 epochs. But the validity of your method has been verified.
(16X16)
(128X128)
Your paper and code help me a lot!
Hi, I have a quick question about the loss for the two networks. I have just started using this code to try and generate better quality images, because the paper sounded interesting. And as you have been able to generate good images i thought this was a good place to ask, instead of opeing a new issue. So regarding the loss: for other training of GAN you usually look at the loss for the discriminator and generator to detect collapses. Should a dropp to zero of the discriminator be interpretaded as that the discriminator was won or is that normal here? The generator loss is also all over the place, varies between 3 and 1000.
Generator Configuration:
Generator(
(layers): ModuleList(
(0): GenInitialBlock(
(conv_1): _equalized_deconv2d(512, 512, 4, 4)
(conv_2): _equalized_conv2d(512, 512, 3, 3)
(pixNorm): PixelwiseNorm()
(lrelu): LeakyReLU(negative_slope=0.2)
)
(1): GenGeneralConvBlock(
(conv_1): _equalized_conv2d(512, 512, 3, 3)
(conv_2): _equalized_conv2d(512, 512, 3, 3)
(pixNorm): PixelwiseNorm()
(lrelu): LeakyReLU(negative_slope=0.2)
)
(2): GenGeneralConvBlock(
(conv_1): _equalized_conv2d(512, 512, 3, 3)
(conv_2): _equalized_conv2d(512, 512, 3, 3)
(pixNorm): PixelwiseNorm()
(lrelu): LeakyReLU(negative_slope=0.2)
)
(3): GenGeneralConvBlock(
(conv_1): _equalized_conv2d(512, 512, 3, 3)
(conv_2): _equalized_conv2d(512, 512, 3, 3)
(pixNorm): PixelwiseNorm()
(lrelu): LeakyReLU(negative_slope=0.2)
)
(4): GenGeneralConvBlock(
(conv_1): _equalized_conv2d(256, 512, 3, 3)
(conv_2): _equalized_conv2d(256, 256, 3, 3)
(pixNorm): PixelwiseNorm()
(lrelu): LeakyReLU(negative_slope=0.2)
)
(5): GenGeneralConvBlock(
(conv_1): _equalized_conv2d(128, 256, 3, 3)
(conv_2): _equalized_conv2d(128, 128, 3, 3)
(pixNorm): PixelwiseNorm()
(lrelu): LeakyReLU(negative_slope=0.2)
)
(6): GenGeneralConvBlock(
(conv_1): _equalized_conv2d(64, 128, 3, 3)
(conv_2): _equalized_conv2d(64, 64, 3, 3)
(pixNorm): PixelwiseNorm()
(lrelu): LeakyReLU(negative_slope=0.2)
)
)
(rgb_converters): ModuleList(
(0): _equalized_conv2d(3, 512, 1, 1)
(1): _equalized_conv2d(3, 512, 1, 1)
(2): _equalized_conv2d(3, 512, 1, 1)
(3): _equalized_conv2d(3, 512, 1, 1)
(4): _equalized_conv2d(3, 256, 1, 1)
(5): _equalized_conv2d(3, 128, 1, 1)
(6): _equalized_conv2d(3, 64, 1, 1)
)
)
Discriminator Configuration:
Discriminator(
(rgb_to_features): ModuleList(
(0): _equalized_conv2d(256, 3, 1, 1)
(1): _equalized_conv2d(256, 3, 1, 1)
(2): _equalized_conv2d(256, 3, 1, 1)
(3): _equalized_conv2d(128, 3, 1, 1)
(4): _equalized_conv2d(64, 3, 1, 1)
(5): _equalized_conv2d(64, 3, 1, 1)
)
(final_converter): _equalized_conv2d(256, 3, 1, 1)
(layers): ModuleList(
(0): DisGeneralConvBlock(
(conv_1): _equalized_conv2d(512, 512, 3, 3)
(conv_2): _equalized_conv2d(256, 512, 3, 3)
(downSampler): AvgPool2d(kernel_size=2, stride=2, padding=0)
(lrelu): LeakyReLU(negative_slope=0.2)
)
(1): DisGeneralConvBlock(
(conv_1): _equalized_conv2d(512, 512, 3, 3)
(conv_2): _equalized_conv2d(256, 512, 3, 3)
(downSampler): AvgPool2d(kernel_size=2, stride=2, padding=0)
(lrelu): LeakyReLU(negative_slope=0.2)
)
(2): DisGeneralConvBlock(
(conv_1): _equalized_conv2d(512, 512, 3, 3)
(conv_2): _equalized_conv2d(256, 512, 3, 3)
(downSampler): AvgPool2d(kernel_size=2, stride=2, padding=0)
(lrelu): LeakyReLU(negative_slope=0.2)
)
(3): DisGeneralConvBlock(
(conv_1): _equalized_conv2d(256, 256, 3, 3)
(conv_2): _equalized_conv2d(256, 256, 3, 3)
(downSampler): AvgPool2d(kernel_size=2, stride=2, padding=0)
(lrelu): LeakyReLU(negative_slope=0.2)
)
(4): DisGeneralConvBlock(
(conv_1): _equalized_conv2d(128, 128, 3, 3)
(conv_2): _equalized_conv2d(128, 128, 3, 3)
(downSampler): AvgPool2d(kernel_size=2, stride=2, padding=0)
(lrelu): LeakyReLU(negative_slope=0.2)
)
(5): DisGeneralConvBlock(
(conv_1): _equalized_conv2d(64, 64, 3, 3)
(conv_2): _equalized_conv2d(64, 64, 3, 3)
(downSampler): AvgPool2d(kernel_size=2, stride=2, padding=0)
(lrelu): LeakyReLU(negative_slope=0.2)
)
)
(final_block): DisFinalBlock(
(batch_discriminator): MinibatchStdDev()
(conv_1): _equalized_conv2d(512, 513, 3, 3)
(conv_2): _equalized_conv2d(512, 512, 4, 4)
(conv_3): _equalized_conv2d(1, 512, 1, 1)
(lrelu): LeakyReLU(negative_slope=0.2)
)
)
Starting the training process ...
Epoch: 1
Elapsed [0:00:09.136577] batch: 1 d_loss: 2.338377 g_loss: 28.258415
Elapsed [0:07:07.622450] batch: 498 d_loss: 0.000000 g_loss: 9.889235
Elapsed [0:13:59.471320] batch: 996 d_loss: 0.000000 g_loss: 16.424404
Elapsed [0:20:50.408592] batch: 1494 d_loss: 0.000000 g_loss: 16.538080
Elapsed [0:27:41.144892] batch: 1992 d_loss: 0.077701 g_loss: 40.352341
Elapsed [0:27:48.926970] batch: 2000 d_loss: 0.000000 g_loss: 36.448349
Elapsed [0:34:32.944078] batch: 2490 d_loss: 0.000000 g_loss: 12.044651
Elapsed [0:41:23.530313] batch: 2988 d_loss: 0.000000 g_loss: 20.246790
Elapsed [0:48:14.023905] batch: 3486 d_loss: 0.026836 g_loss: 12.155825
Elapsed [0:55:04.547208] batch: 3984 d_loss: 0.000000 g_loss: 15.979989
Elapsed [0:55:18.911984] batch: 4000 d_loss: 0.000000 g_loss: 8.796050
Elapsed [1:01:56.169209] batch: 4482 d_loss: 0.000000 g_loss: 27.396168
Elapsed [1:08:46.572035] batch: 4980 d_loss: 0.000000 g_loss: 22.475296
Elapsed [1:15:37.071554] batch: 5478 d_loss: 0.000000 g_loss: 22.913460
Elapsed [1:22:27.626737] batch: 5976 d_loss: 0.000000 g_loss: 16.043606
Elapsed [1:22:48.539515] batch: 6000 d_loss: 0.000000 g_loss: 16.721962
Elapsed [1:29:19.261423] batch: 6474 d_loss: 0.000000 g_loss: 15.449608
Elapsed [1:36:09.594290] batch: 6972 d_loss: 0.000000 g_loss: 17.748825
Elapsed [1:43:00.011452] batch: 7470 d_loss: 0.282461 g_loss: 19.857460
Elapsed [1:49:50.445767] batch: 7968 d_loss: 0.000000 g_loss: 17.677956
Elapsed [1:50:17.950175] batch: 8000 d_loss: 0.000000 g_loss: 54.902863
Elapsed [1:56:42.026659] batch: 8466 d_loss: 0.000000 g_loss: 6.403756
Elapsed [2:03:32.422947] batch: 8964 d_loss: 0.000000 g_loss: 104.009857
Elapsed [2:10:22.878551] batch: 9462 d_loss: 0.000000 g_loss: 76.007889
Elapsed [2:17:13.808884] batch: 9960 d_loss: 0.000000 g_loss: 12.342272
Elapsed [2:17:47.965314] batch: 10000 d_loss: 0.069687 g_loss: 17.976465
Elapsed [2:24:06.157861] batch: 10458 d_loss: 0.000000 g_loss: 10.958153
Elapsed [2:30:57.107763] batch: 10956 d_loss: 0.000000 g_loss: 18.901848
Elapsed [2:37:48.229310] batch: 11454 d_loss: 0.069571 g_loss: 39.505516
Elapsed [2:44:39.324656] batch: 11952 d_loss: 0.000000 g_loss: 24.722958
Elapsed [2:45:20.027688] batch: 12000 d_loss: 0.000000 g_loss: 26.929533
Elapsed [2:51:31.513562] batch: 12450 d_loss: 0.000000 g_loss: 11.466362
Elapsed [2:58:22.621024] batch: 12948 d_loss: 0.012587 g_loss: 25.722101
Elapsed [3:05:13.673413] batch: 13446 d_loss: 0.000000 g_loss: 33.185280
Elapsed [3:12:04.843716] batch: 13944 d_loss: 0.000000 g_loss: 41.325165
Elapsed [3:12:52.146983] batch: 14000 d_loss: 0.000000 g_loss: 55.505829
Elapsed [3:18:57.129070] batch: 14442 d_loss: 0.000000 g_loss: 23.927341
Elapsed [3:25:48.259257] batch: 14940 d_loss: 0.000000 g_loss: 87.579803
Elapsed [3:32:39.568090] batch: 15438 d_loss: 0.000000 g_loss: 13.608449
Elapsed [3:39:31.122553] batch: 15936 d_loss: 0.000000 g_loss: 13.062461
Elapsed [3:40:25.185228] batch: 16000 d_loss: 0.213807 g_loss: 10.851057
Elapsed [3:46:24.286060] batch: 16434 d_loss: 0.000000 g_loss: 13.655262
Elapsed [3:53:16.516315] batch: 16932 d_loss: 0.023913 g_loss: 18.987953
Elapsed [4:00:09.128467] batch: 17430 d_loss: 0.000000 g_loss: 31.275900
Elapsed [4:07:02.309726] batch: 17928 d_loss: 0.000000 g_loss: 19.924543
Elapsed [4:08:03.928182] batch: 18000 d_loss: 0.000000 g_loss: 19.720871
Elapsed [4:13:57.862766] batch: 18426 d_loss: 0.000000 g_loss: 31.201002
Elapsed [4:20:51.300325] batch: 18924 d_loss: 0.000000 g_loss: 11.246223
Elapsed [4:27:44.639170] batch: 19422 d_loss: 0.000000 g_loss: 21.002312
Elapsed [4:34:37.392029] batch: 19920 d_loss: 0.000000 g_loss: 24.066872
Elapsed [4:35:45.667338] batch: 20000 d_loss: 0.000000 g_loss: 20.162798
Elapsed [4:41:33.112231] batch: 20418 d_loss: 0.000000 g_loss: 8.949071
Elapsed [4:48:27.750149] batch: 20916 d_loss: 0.000000 g_loss: 20.048477
Elapsed [4:55:22.106759] batch: 21414 d_loss: 0.000000 g_loss: 7.924605
Elapsed [5:02:16.560692] batch: 21912 d_loss: 0.000000 g_loss: 23.552919
Elapsed [5:03:32.165504] batch: 22000 d_loss: 0.000000 g_loss: 23.629990
Elapsed [5:09:14.060563] batch: 22410 d_loss: 0.000000 g_loss: 11.198383
Elapsed [5:16:08.628691] batch: 22908 d_loss: 0.000000 g_loss: 18.672043
Elapsed [5:23:02.965504] batch: 23406 d_loss: 0.000000 g_loss: 13.144236
Elapsed [5:29:57.806316] batch: 23904 d_loss: 0.000000 g_loss: 16.654057
Elapsed [5:31:20.712796] batch: 24000 d_loss: 0.000000 g_loss: 17.913643
Elapsed [5:36:56.825469] batch: 24402 d_loss: 0.925256 g_loss: 9.338171
Elapsed [5:43:52.270629] batch: 24900 d_loss: 0.000000 g_loss: 15.507483
Elapsed [5:50:47.319940] batch: 25398 d_loss: 0.000000 g_loss: 9.522266
Elapsed [5:57:42.477147] batch: 25896 d_loss: 0.024321 g_loss: 10.962959
Elapsed [5:59:11.415684] batch: 26000 d_loss: 0.000000 g_loss: 14.562057
Elapsed [6:04:40.540057] batch: 26394 d_loss: 0.000000 g_loss: 13.476106
Elapsed [6:11:35.331849] batch: 26892 d_loss: 0.001523 g_loss: 6.066289
Elapsed [6:18:31.255044] batch: 27390 d_loss: 0.000000 g_loss: 9.424282
Elapsed [6:25:27.569852] batch: 27888 d_loss: 0.615839 g_loss: 22.649738
Elapsed [6:27:04.382164] batch: 28000 d_loss: 0.000000 g_loss: 13.887462
Elapsed [6:32:27.148761] batch: 28386 d_loss: 0.000000 g_loss: 11.166329
Elapsed [6:39:22.719316] batch: 28884 d_loss: 0.000000 g_loss: 12.516731
Elapsed [6:46:18.041246] batch: 29382 d_loss: 0.000000 g_loss: 45.305389
Elapsed [6:53:13.288387] batch: 29880 d_loss: 0.000000 g_loss: 17.673706
Elapsed [6:54:56.054860] batch: 30000 d_loss: 0.000000 g_loss: 11.626278
Elapsed [7:00:11.837390] batch: 30378 d_loss: 0.000000 g_loss: 10.341488
Elapsed [7:07:06.584213] batch: 30876 d_loss: 0.000000 g_loss: 12.946634
Elapsed [7:14:00.786104] batch: 31374 d_loss: 0.000000 g_loss: 15.844328
Elapsed [7:20:54.952582] batch: 31872 d_loss: 0.733477 g_loss: 12.231483
Elapsed [7:22:43.626915] batch: 32000 d_loss: 0.998826 g_loss: 19.025818
Elapsed [7:27:52.113463] batch: 32370 d_loss: 0.009136 g_loss: 11.359642
Elapsed [7:34:46.386962] batch: 32868 d_loss: 0.000000 g_loss: 10.601990
Elapsed [7:41:40.555202] batch: 33366 d_loss: 0.000000 g_loss: 7.994129
Elapsed [7:48:34.853510] batch: 33864 d_loss: 0.000000 g_loss: 12.961597
Elapsed [7:50:31.224009] batch: 34000 d_loss: 0.000000 g_loss: 25.971003
Elapsed [7:55:33.052553] batch: 34362 d_loss: 0.000000 g_loss: 9.010212
Elapsed [8:02:27.217079] batch: 34860 d_loss: 0.000000 g_loss: 14.558136
Elapsed [8:09:21.306493] batch: 35358 d_loss: 0.009271 g_loss: 9.470531
Elapsed [8:16:15.476907] batch: 35856 d_loss: 0.000000 g_loss: 15.962500
Elapsed [8:18:17.330305] batch: 36000 d_loss: 0.000000 g_loss: 13.263047
Elapsed [8:23:12.587199] batch: 36354 d_loss: 0.000000 g_loss: 9.665047
Elapsed [8:30:06.782322] batch: 36852 d_loss: 0.000000 g_loss: 9.369083
Elapsed [8:37:01.056768] batch: 37350 d_loss: 0.000000 g_loss: 20.779835
Elapsed [8:43:55.301317] batch: 37848 d_loss: 0.000000 g_loss: 12.678933
Elapsed [8:46:03.871824] batch: 38000 d_loss: 0.000000 g_loss: 11.158394
Elapsed [8:50:52.478865] batch: 38346 d_loss: 0.000000 g_loss: 22.158953
Elapsed [8:57:46.757418] batch: 38844 d_loss: 0.261001 g_loss: 15.882168
Elapsed [9:04:40.888215] batch: 39342 d_loss: 0.000000 g_loss: 13.961426
Elapsed [9:11:34.938936] batch: 39840 d_loss: 0.000000 g_loss: 30.813374
Elapsed [9:13:50.013406] batch: 40000 d_loss: 0.000000 g_loss: 17.485420
Elapsed [9:18:31.741201] batch: 40338 d_loss: 0.050298 g_loss: 10.827607
Elapsed [9:25:25.269253] batch: 40836 d_loss: 0.000000 g_loss: 37.008732
Elapsed [9:32:18.905305] batch: 41334 d_loss: 0.000000 g_loss: 12.677896
Elapsed [9:39:12.592458] batch: 41832 d_loss: 0.000000 g_loss: 6.966478
Elapsed [9:41:34.326714] batch: 42000 d_loss: 0.000000 g_loss: 18.496456
Elapsed [9:46:09.368904] batch: 42330 d_loss: 0.000000 g_loss: 16.485332
Elapsed [9:53:02.944594] batch: 42828 d_loss: 0.000000 g_loss: 23.858597
Elapsed [9:59:56.496192] batch: 43326 d_loss: 0.000000 g_loss: 15.514748
Elapsed [10:06:49.922014] batch: 43824 d_loss: 0.000000 g_loss: 41.753380
Elapsed [10:09:18.270949] batch: 44000 d_loss: 0.000000 g_loss: 15.598457
Elapsed [10:13:46.774287] batch: 44322 d_loss: 0.118957 g_loss: 18.771698
Elapsed [10:20:40.530525] batch: 44820 d_loss: 0.000000 g_loss: 16.245186
Elapsed [10:27:34.159738] batch: 45318 d_loss: 0.000000 g_loss: 23.490396
Elapsed [10:34:27.848536] batch: 45816 d_loss: 0.021629 g_loss: 8.572326
Elapsed [10:37:02.696231] batch: 46000 d_loss: 0.000000 g_loss: 10.857470
Elapsed [10:41:24.695902] batch: 46314 d_loss: 0.000000 g_loss: 29.556044
Elapsed [10:48:18.148562] batch: 46812 d_loss: 0.030885 g_loss: 7.525697
Elapsed [10:55:11.625591] batch: 47310 d_loss: 0.045138 g_loss: 9.609584
Elapsed [11:02:05.220831] batch: 47808 d_loss: 0.129427 g_loss: 18.386642
Elapsed [11:04:46.484853] batch: 48000 d_loss: 0.000000 g_loss: 7.450067
Elapsed [11:09:01.766081] batch: 48306 d_loss: 0.000000 g_loss: 30.403725
Elapsed [11:15:55.357354] batch: 48804 d_loss: 0.000000 g_loss: 16.723495
Elapsed [11:22:48.939699] batch: 49302 d_loss: 0.000000 g_loss: 24.858213
Elapsed [11:29:42.613116] batch: 49800 d_loss: 0.000000 g_loss: 22.653553
Time taken for epoch: 41416.458 secs
Epoch: 2
Elapsed [11:30:18.414299] batch: 1 d_loss: 0.674042 g_loss: 14.901873
Elapsed [11:37:11.788918] batch: 498 d_loss: 0.000000 g_loss: 10.384844
Elapsed [11:44:05.690407] batch: 996 d_loss: 0.000000 g_loss: 21.126076
Elapsed [11:50:59.423473] batch: 1494 d_loss: 0.030664 g_loss: 11.375048
Elapsed [11:57:53.316281] batch: 1992 d_loss: 0.000000 g_loss: 15.586699
Elapsed [11:58:03.453155] batch: 2000 d_loss: 0.000000 g_loss: 18.800297
Elapsed [12:04:50.764074] batch: 2490 d_loss: 0.054183 g_loss: 9.096399
Elapsed [12:11:44.710594] batch: 2988 d_loss: 0.000000 g_loss: 9.935973
Elapsed [12:18:38.594535] batch: 3486 d_loss: 0.000000 g_loss: 9.296963
Elapsed [12:25:32.569276] batch: 3984 d_loss: 0.000000 g_loss: 17.901390
Elapsed [12:25:48.936851] batch: 4000 d_loss: 0.000000 g_loss: 15.580759
Elapsed [12:32:29.615201] batch: 4482 d_loss: 0.000000 g_loss: 16.824059
Elapsed [12:39:23.649362] batch: 4980 d_loss: 0.128205 g_loss: 14.636445
Elapsed [12:46:17.547570] batch: 5478 d_loss: 0.000000 g_loss: 12.853954
Elapsed [12:53:11.369015] batch: 5976 d_loss: 0.000000 g_loss: 21.049076
Elapsed [12:53:34.405860] batch: 6000 d_loss: 0.000000 g_loss: 11.767218
Elapsed [13:00:08.577302] batch: 6474 d_loss: 1.263216 g_loss: 9.647612
Elapsed [13:07:02.431205] batch: 6972 d_loss: 0.000000 g_loss: 9.879293
Elapsed [13:13:56.277202] batch: 7470 d_loss: 0.000000 g_loss: 14.576007
Elapsed [13:20:50.128382] batch: 7968 d_loss: 0.000000 g_loss: 25.199493
Elapsed [13:21:19.857440] batch: 8000 d_loss: 0.000000 g_loss: 12.980972
Elapsed [13:27:47.385592] batch: 8466 d_loss: 0.000000 g_loss: 11.132238
Elapsed [13:34:41.165060] batch: 8964 d_loss: 0.000000 g_loss: 16.122200
Elapsed [13:41:35.135643] batch: 9462 d_loss: 0.000000 g_loss: 12.721147
Elapsed [13:48:28.965222] batch: 9960 d_loss: 0.016159 g_loss: 12.740223
Elapsed [13:49:05.014303] batch: 10000 d_loss: 0.000000 g_loss: 17.202229
Elapsed [13:55:25.683578] batch: 10458 d_loss: 0.000000 g_loss: 12.412262
Elapsed [14:02:19.459729] batch: 10956 d_loss: 0.000000 g_loss: 17.442894
Elapsed [14:09:13.393001] batch: 11454 d_loss: 0.018738 g_loss: 13.172106
Elapsed [14:16:07.635433] batch: 11952 d_loss: 0.000000 g_loss: 10.256526
Elapsed [14:16:50.287763] batch: 12000 d_loss: 0.000000 g_loss: 6.879043
Elapsed [14:23:04.523150] batch: 12450 d_loss: 0.000000 g_loss: 9.990157
Elapsed [14:29:58.356799] batch: 12948 d_loss: 0.048164 g_loss: 11.465200
Elapsed [14:36:52.113379] batch: 13446 d_loss: 0.000000 g_loss: 9.312673
Elapsed [14:43:45.859499] batch: 13944 d_loss: 0.000000 g_loss: 12.853970
Elapsed [14:44:35.146990] batch: 14000 d_loss: 0.000000 g_loss: 6.012763
Elapsed [14:50:42.832900] batch: 14442 d_loss: 0.000000 g_loss: 20.063652
Elapsed [14:57:36.719575] batch: 14940 d_loss: 0.000000 g_loss: 21.113297
Elapsed [15:04:30.675369] batch: 15438 d_loss: 0.086522 g_loss: 15.928454
Elapsed [15:11:24.443290] batch: 15936 d_loss: 0.000000 g_loss: 8.222857
Elapsed [15:12:20.295643] batch: 16000 d_loss: 0.000000 g_loss: 14.333256
Elapsed [15:18:21.365570] batch: 16434 d_loss: 0.182449 g_loss: 15.999907
Elapsed [15:25:15.231426] batch: 16932 d_loss: 0.130269 g_loss: 16.058449
Elapsed [15:32:09.051052] batch: 17430 d_loss: 0.000000 g_loss: 23.200623
Elapsed [15:39:03.329984] batch: 17928 d_loss: 0.014015 g_loss: 8.625858
Elapsed [15:40:05.799918] batch: 18000 d_loss: 0.000000 g_loss: 17.969467
Elapsed [15:46:01.351715] batch: 18426 d_loss: 0.000000 g_loss: 22.537891
Elapsed [15:52:55.099180] batch: 18924 d_loss: 0.000000 g_loss: 13.364973
Elapsed [15:59:48.851702] batch: 19422 d_loss: 0.000000 g_loss: 11.498882
Best Regards
@fumoffu947,
Well, this is a characteristic of the RelativisticHinge loss. For most of the training, the discriminator loss value remains 0
(adds to the stability of the training). It is indeed expected behaviour.
Also, please update your code to the latest changes. I have added colour correction to the generated samples.
Also, please do post your trained results. It will help others. Hope this helps.
Best regards, @akanimax
I have updatet to the latest code, and do not have acces to the results so easily as i have some restrictions on me. Will update the previous post with the trained results when i have acces to it.
And some question regarding the training time and loss. Will the use of this structure of training (and networks) increase the training time, as it learns from the bottom up? (thus have more epochs neede for good results) And will you see a more stable loss for the generator as the layers will learn to generate better images? (or is the high variance a characteristic of the ReHinge loss)
Thanks for the quick respose. Best Regards
@huangzh13, Yes, this is the desired behaviour. You get exactly these kinds of colour blocks on the high resolution, but could you please check the output of lower resolutions? MSG-GAN indeed has this advantage over other GANs that you can check the training of lower resolutions as well. So you get highly informative feedback throughout training. Also, just have some patience, because these colour blocks very soon will convert into your required samples.
For your reference: This is output of my network at highest resolution (256_x_256) at 15th epoch (data is very less 5.5K only):
And at the same time, at 8_x_8 the output is:
Basically, the point is that the training progresses from bottoms up. And then synchronizes everywhere. Hope this helps.
Best regards, @akanimax
@akanimax, can you comment more about this? What can one expect to see being returned as "loss" for both Genenerate and Discriminator networks as the higher resolutions are looking like solid color blocks? I'm assuming that there should be some wavering loss in both networks/functions, but I'm experiencing a rapid drop to Loss = 0.0
for Discriminator, no matter what learning rates I set. Currently testing with the Generator lr = 2 to 5 times the lr of the Discriminator.
Thanks,
@BlindElephants, Well, firstly please check the new training gif add to the readme this more clearly explains how the training takes place. BTW, please note that our MSG-GAN uses the relativistic hinge loss which is indeed a margin-adaption loss at its heart. So please don't be discouraged by seeing a value of 0.0
as the discriminator loss. It is highly expected behaviour. This is unlike the other loss functions like the WGAN-GP
where a 0.0
discriminator loss would indicate a complete collapse of training and no further training could happen (vanishing gradients). In our case though, 0.0
discriminator loss is a good sign of stability. Please do not spend time in cherry picking the learning rates. This was the main motivation behind our work. Please use the default values and let the model train. You will get good results.
Also, there is, unfortunately, nothing that you can make out from the values of the losses here. It's just an indicator of the two player game.
Hope this helps.
@akanimax
@BlindElephants I had the same color blocks for about 20 epochs before any change. But the change came be seen in the lower layer long before it is seen in the last layer, as said by @akanimax. So look for changes in the previous layers first.
@fumoffu947 Thanks for the reply. I ended up just letting it run for a while, despite seeing loss=0.0 on the discriminator side from the beginning. You're totally right, the solid color patches precede some really interesting developments, and indeed, I did see changes and more detail in lower layers first.
Here's a time lapse video I posted on Vimeo of training: https://vimeo.com/330681428. Source material is the movie Edge of Tomorrow (yes, I know... this great work of sci-fi action...) which was frame dumped to produce about 38,000 images. I stopped this training roughly where this video ends, so things are still quite abstract and only just starting to form recognizable shapes. But great test.
Thanks @akanimax, this repo is great and super interesting.
@BlindElephants,
That is really interesting. You'd have gotten even better results with a little more training. BTW, did you use the raw frames or did some preprocessing to the extracted images before training?
Also, please feel free to open a PR like @huangzh13, if you'd like to share your results (through the readme).
Best regards, @akanimax
I used raw frames dumped with ffmpeg from the original source video. Have not played with additional preprocessing yet.
I currently am running a follow-up training session that is further along than where I ended this sample and you're right, things are getting really interesting quickly. Will open PR when appropriate to share findings.
@BlindElephants, Glad to know that. Happy to help.
Best regards, @akanimax
@BlindElephants @akanimax @huangzh13 @fumoffu947 can i get a bit more info on your runs?
I'm trying to run this on Colab, 1xK80 GPU... i have a 10k image dataset i want to train with, 128x128. I've set 16 as the batch_size to see if the training moved a little faster(the log output...) don't know if this is right? Should i increase the batch_size?
How long, with the 2xV100 GPU does it take to train a model? say 128 or 256 image size?
I'm thinking of firing up a Google Cloud instance, as i'm on free tier, to train my model. Any recomendations on specs? vCPU, RAM, 2xV100, 4xV100?
Thanks and awesome work, Tiago
@talvasconcelos I can't really say what kind of hardware you should use. I used a Quadro P5000 16Gig graphics card with a batch size of 8 (because of the huge fluctuation in memory usage before it stabilizes). As you can see in my log, one epoch took about 11 hours. I got really good results after 5 days (when I ran on medical images of size 256x256). It should probably run for longer, which will result in better image.
As for the batch size. You should have as large as the graphics card allows. Larger batch sizes give better gradients, because of the more varied images, and may speed up the convergence of the model. but the batch size should not be as large as the data set :D.
For a newer graphics card (than the Quadro P5000) it will probably decrease the training time significantly, as they have become faster. But as far as I know, there are no good stopping criteria for a GAN. You will have to inspect the images and decide when they are good enough, so the model might have to train for a while. (if i am wrong in this, please correct me).
Best Regards
@talvasconcelos I've run training sessions with a few different data sets, currently one set is approx 9,000 images and the other is approx 60,000 images.
I'm training at 1024x1024 resolution, on a machine that has two RTX 2080ti GPUs. With that, I can do batch size = 5.
For the training set that has about 9,000 images, it takes just over an hour (about 65 minutes) per epoch.
Also, probably most importantly, I'm not looking to generate 100% realistic images with this, I'm a working artist and use M.L. software to generate images/video for aesthetic and conceptual goals: what I look for in the outcome of a training session may be different than yours.
I've so far let this run to about epoch 200 with some really interesting results. (200 epochs == approx 200 hours == 8.33333 days).
Obviously the training set with 60,000 images takes much longer per epoch, but is also achieving some interesting visual outcomes on a similar time scale. I haven't fully explored this model yet, so can't comment further.
Make sure that you have cudnn benchmark
enabled. Parallelize what you can if you have multiple GPUs. And ensure that your data loader is set up properly with enough workers that it will keep up with your needs.
Wow guys, thanks a lot for the input. I've had some runs of GANs made a DCGAN in Keras and tryied a few alternatives. The problem with my DCGAN is that it gives some "good" results (@BlindElephants i'm also trying to do some art stuff, so no realistic output also) at 64x64. But for 128x128 apart from taking forever, it collapses after a while.
I've let the BMSG-GAN run with default settings, other than my dataset, 10k images. It's running for about 2 hours, with 16 batchSize and a 256 latent. The second batch was done after 1h35m. If you think Colab stops the notebook after 12h, it's not going to be a pretty process...
Might spin the VM after all, with a couple powerfull GPUs...
EDIT: I'm so stupid i was training on CPU... forgot to set runtime to GPU. Now it's taking around 5m per epoch!
So... epoch 114 with hyperparameters: latent=256 batch_size=32 (for the first 100 epochs) now is running on 48 10k images dataset
My dataset is not as homogeneous as the faces... @akanimax how long did the flowers test, the second one, with a bigger difference in pictures, ran? What were your parameters?
@BlindElephants and @fumoffu947, Thank you so much for the help.
@talvasconcelos,
Your latent size 256
is too small. Please increase it to 512
at least. You will have to reduce your batch size appropriately. Rest everything seems fine to me. Your data is 10K right? I suppose you should start getting good results at around 1000 epochs. I have trained the oxford-flowers 8K dataset for 3000 epochs. And obtained good results at around 800-1000 epochs.
Hope this helps.
Best regards, @akanimax
@talvasconcelos I can't really say what kind of hardware you should use. I used a Quadro P5000 16Gig graphics card with a batch size of 8 (because of the huge fluctuation in memory usage before it stabilizes). As you can see in my log, one epoch took about 11 hours. I got really good results after 5 days (when I ran on medical images of size 256x256). It should probably run for longer, which will result in better image.
As for the batch size. You should have as large as the graphics card allows. Larger batch sizes give better gradients, because of the more varied images, and may speed up the convergence of the model. but the batch size should not be as large as the data set :D.
For a newer graphics card (than the Quadro P5000) it will probably decrease the training time significantly, as they have become faster. But as far as I know, there are no good stopping criteria for a GAN. You will have to inspect the images and decide when they are good enough, so the model might have to train for a while. (if i am wrong in this, please correct me).
Best Regards
@fumoffu947, You can monitor the fid scores of the training models. I am going to include the code to monitor the fid during training itself later. Currently, you have to run a post training script to calculate the fid of all the models and then use the one with the lowest FID score.
@akanimax This is an interesting discussion as I am running into similar issues my dataset is about 10k. (512x512) After 50 epochs I see reasonable structural results up to 64x64 but even after 100 epochs anything above that level still looks very wongy none of the finner details seems to be translated correctly and there are clear blob shaped artifacts all over the place?
I am just worried that is might have already collapsed and putting anymore training time in does not make much of a difference. The results between Epoch75 and Epoch100 on the higher res are not significant better. Should I simply give it more epochs? Or would it be better to increase the dataset size?
I am using relativistic-hinge loss btw.
Also tried out ProGan with this dataset it has a similar issue there as well. Which is quiet interesting because with that one I also tried it with a very different but much smaller dataset (<5k) that does produce reasonable results in even fewer training epochs.
@Mut1nyJD, @fumoffu947, @BlindElephants, @talvasconcelos The dynamics of MSG-GAN are a bit unique. So, from my experience you usually need around 1000 epochs to get good (crisp and clear results like the ones displayed on the Repo's README) for below 10K dataset. The info related to how long all the datasets (Flowers, CelebA, Lsun bedrooms, CelebA-HQ, etc.) were trained for is elaborated in the Supplementary material which is not included in the ArXiv version of our paper yet. But for your reference guys, I am sharing the FID plot of the Oxford flowers run here since the dataset size of Oxford flowers (8K) is quite similar to the ones you are experimenting with.
I hope this will give you more information about the training dynamics of MSG-GAN. But one thing for sure, if you are getting good results on lower resolution, they always translate to the higher resolution eventually.
Best regards, @akanimax
@talvasconcelos Use a larger latent size.
You might also want to try something with a very small training set ( <= 5000, or even 1000-2000) just as a test to see what happens (for your own sake, I mean). If you provide a subset of training images that all conform to a particular type or subject, the model should converge quite quickly. If you observe this by outputting periodic samples, you should be able to get an idea for what to expect when you move to a much larger training set, which will possibly follow similar convergence behavior, albeit on a longer time scale with much more varied output.
Okay I am going to post some results soon. Indeed things got better after waiting longer. I also increased the dataset size by a further 50% but even after 350 epochs it still struggles but I wonder if that is simply because unlike most GAN test datasets it has far less homogeneity. Faces simply are too easy. :) BMSG-GAN is doing a lot better on this dataset than ProGAN though which even with relativistic hinge loss collapsed before it reached the final resolution. I think finding the right number of epochs for each resolution step with ProGAN is tricky. I am training StyleGAN to compare which seems to fair better than ProGAN.
Hello Akanimax, I'm using your BMSG-GAN repo for text-to-face task and mode collapses. You'd used ProGAN for this task. I've stucked with it for a long time and exhausted. Any advice for this? Thank you.
Hi @akanimax, I am trying your project on my data (>11k images of resolution higher than 512x512)
Could you please suggest me easiest way to make it work? I started it but got bad results possibly due to small number of epochs (I used 100). Recently I started again with all defaults but num_epochs=2000. Epoch takes a lot of time in my setup (2x 2080 Ti), could you please kindly suggest me optimal parameters for my issue?
I particularly interested in latent_size=512 which as I understood determines overall model size? but also I am interested in more general question do GAN need more than classic 100 element z noise and if it does why it does?
I also have some more complex and more general question: I need a way to place some predefined objects (digits in my case) in image, how could I possibly do that? Could I produce some layout (using some other gan or just using some formula) and make something like "detector" in critic to add loss on detections?
Hi @akanimax, I started learning about GANs recently and I found this model really cool, great job! I have a question regarding the loss function. I've been following this discussion closely and you mention that with the RAHinge loss, it's expected for the discriminator to reach 0.0 loss early on in training. Could you comment a bit about how the generator loss should behave? I'm currently training a conditional version of this GAN for medical image synthesis and I notice that the discriminator reaches 0.0 but the generator loss increases gradually, as shown below:
It must be noted that in this plot, I am showing the loss per epoch (averaging the loss over all batches).
Despite this behavior in the learning curves, the images look reasonable from a quick visual inspection so I am not sure whether there is some underlying issue like mode collapse or divergence. Is there a way to tell this from the learning curves?
Thanks again!
- Advaith