taming-transformers
taming-transformers copied to clipboard
Discriminator Loss Bug
As refered in #93 , discriminator is the key to get sharp image. But in my experiments, sometime aeloss get negative values.
I think in any network design, negative loss should be avoid because it could go negative infinity.
The negative term comes from g_loss as g_loss = -torch.mean(logits_fake)
, where logits_fake is the value after convolution, no sigmoid to limit the outputs logit_values. When the generator successfully generate a correct patch such as pure white space, discriminator will find some way to exploit this patch by making its logit go infinity, which will encourage generator to draw more patches like this and completely ignore other loss terms.
After checking the original code in cycle-gan repo,
There do exist an outer loss term to limit the patch logits, only if gan_mode in ['wgangp']. I did not see any fancy 'wgangp' thing, so this should be a bug.
This is not a bug, the loss used in the paper is Hinge GAN loss, which can have negative values for g_loss. If trained stably and correctly, the loss won't go to negative infinity
Hi, Im curious about how the
g_loss` is derived without using a sigmoid or softplus function. After searching several hours, I cannot find any reference for using logits directly like in this implementation.
https://github.com/CompVis/taming-transformers/blob/1bbc027acb6a47e4eb348d611f9af53f1038ffee/taming/modules/losses/vqperceptual.py#L98
Hi, I
m curious about how the
g_loss` is derived without using a sigmoid or softplus function. After searching several hours, I cannot find any reference for using logits directly like in this implementation.https://github.com/CompVis/taming-transformers/blob/1bbc027acb6a47e4eb348d611f9af53f1038ffee/taming/modules/losses/vqperceptual.py#L98
@SuwoongHeo That seems to be Wasserstein loss. I find an implementation here from the official repository of paper Improved Training of Wasserstein GANs.
The description in the VQGAN paper seems inaccurate, which claims to use a binary cross entropy loss following Patch GAN.
Hi, I
m curious about how the
g_loss` is derived without using a sigmoid or softplus function. After searching several hours, I cannot find any reference for using logits directly like in this implementation. https://github.com/CompVis/taming-transformers/blob/1bbc027acb6a47e4eb348d611f9af53f1038ffee/taming/modules/losses/vqperceptual.py#L98@SuwoongHeo That seems to be Wasserstein loss. I find an implementation here from the official repository of paper Improved Training of Wasserstein GANs.
The description in the VQGAN paper seems inaccurate, which claims to use a binary cross entropy loss following Patch GAN.
I wonder why they won't simply stick with what's said in the paper? I spent few hours trying to figure out why I am lucky to stumble upon your comment but imagine if I started a few weeks earlier it would have been a nightmare