Variational_Discriminator_Bottleneck Resizing support/progressive growing support

Hiya. I've been experimenting with generating anime faces with GANs for years now, and I've been trying out your GAN implementation to compare with another (more complex) VGAN implementation by nsheppard (currently called PokeGAN), using ~137k anime faces extracted from my Danooru2017 dataset. The results so far are as good as anything I've gotten aside from Nvidia's ProGAN implementation; samples from a few minutes ago after about 2 days:

Epoch 37 sample of anime faces with akanimax's VGAN

For speed, I am using 128px. At some point when it's converged, it'd be nice to switch to 256px and then 512px without restarting from scratch. Some support for adding on additional layers would be great; the exact 'blending' ProGAN does might be a bit hard to implement, but almost as good would be freezing the original layers for a while and training only the new ones; or even just slapping on more layers would be better than restarting from scratch (given how minibatches go down drastically with increased resolution).

On a side note, there's one trick nsheppard found which might be useful. We don't know if the official VGAN code does this because it seems kind of obvious in retrospect but they haven't released source AFAIK. But in the bottleneck loss with the KL loss, it seems that occasionally i_c can be larger than the KL loss, in which case the loss is then negative? It's not clear what this means or how it might be desirable, so he just got rid of it by bounding it to zero:

-        bottleneck_loss = (th.mean(kl_divergence) - i_c)
+        bottleneck_loss = max(0, (th.mean(kl_divergence) - i_c)) # EDIT: per nsheppard, ensure that the loss can't be negative

It seems to permit stable training with much higher values of i_c.

Nov 20 '18 05:11 gwern

@gwern,

Hi! Thank you for the comment! Firstly, two question: 1.) Are the samples generated using my code or pokeGAN code? (a bit ambiguous there) 2.) Where is nsheppard's code? I couldn't find it here.

I understood what you are proposing with the progressive growing. I actually have written the blending procedure code for the ProGAN in my pro_gan_pytorch package so I could reuse it here.

I'll also try the nsheppard's bottleneck_loss edit. I can see how that might be helping. Thanks

Besides, could you tell me a bit more about what exactly is advanced in nsheppard's VGAN compared to my code? That would be really helpful.

Thank you!

Best regards, @akanimax

Nov 20 '18 05:11 akanimax

yours
currently not public, working on getting permission
great!
nsheppard adds self-attention layers, so it's comparable to SAGAN or BigGAN in not just being vanilla resnet upscaling blocks; self-attention should be better than just and in my face runs it does very quickly manage to avoid artifacts like 3 eyes (which with vanilla VGAN have persisted for a long time) but results are overall frustratingly low quality. There's some basic code for progressive growing: it's not automatically supported but you can start a run with a higher resolution and it'll fill in random layers if necessary. The VDB beta is increased because the max tweak seems to make training stabler. The discriminator mean value is used to reduce noise. He also has some code for 'inverting' the generator to generate prototypical images or something.

Nov 20 '18 15:11 gwern

To clarify, the max(0, ·) trick is only applied to the bottleneck loss that goes into training the discriminator. The update step for beta should use the unadulterated th.mean(kl_divergence) - i_c, so that when the kl divergence is less than i_c, beta is still slowly reduced. So with your code I would edit it like:

-        total_loss = loss + (beta * bottleneck_loss)
+        total_loss = loss + (beta * max(0, bottleneck_loss))

And bottleneck_loss itself would be left alone.

The main benefit of the max(0, ·) trick here is that it seems to make it less harmful when the vdb beta is too high. In turn this means you can safely increase alpha (the learning rate for beta) and have the bottleneck reach equilibrium much faster if you're impatient like me. To this end I also edit the update step like so:

-        beta_new = max(0, beta + (self.alpha * bottleneck_loss))
+        beta_new = max(0, beta + (self.alpha * min(1.0, bottleneck_loss)))

This edit makes sure that beta doesn't explode at the start of training (even with higher values of alpha), and so even if beta overshoots and is too large, it comes back down to the optimal value in a reasonable period of time.

Nov 21 '18 02:11 nshepperd

Sorry for the late reply. I was laden with too much work and was also a bit unwell on the weekend.

@gwern, Thank you for trying out my code and letting me know about your thoughts on it. I hope this helped you. Also, thank you for the clarification about the new @nshepperd's code.

@nshepperd, Thanks a ton for the intuitive explanation. This clears a lot of things out. I'll definitely try all these modifications out. Btw, I am curious about your inverting the generator technique. Could you please explain this technique?

Best regards, @akanimax

Nov 26 '18 04:11 akanimax

Variational_Discriminator_Bottleneck Variational_Discriminator_Bottleneck copied to clipboard

Resizing support/progressive growing support

Variational_Discriminator_Bottleneck
Variational_Discriminator_Bottleneck copied to clipboard