ganhacks icon indicating copy to clipboard operation
ganhacks copied to clipboard

Why Discriminator Loss 0 is a failure mode ?

Open arijitx opened this issue 6 years ago • 15 comments

In 10. you say Discriminator loss 0 is failure mode , but in the paper they say that, image

What I'm getting wrong here ?

Thanks,

arijitx avatar Mar 25 '18 07:03 arijitx

I think the whole point of GANs is to have losses that counterbalance one another. We are not as in traditionnal CNNs in the presence of one loss we wish to reduce as much as possible. The error you show from the paper is indeed D's error. But you must consider also G's error, which is the oppossite of D (this is not exactly true and is also implementation-dependent, but it's the intuition that loss D = - loss G). Therefore, in GAN cases, you don't want D loss to go to zero because that would mean that D is doing a too good job (and most importantly, G a too bad one), ie it can easily discriminate between fake and real data (ie G's creations are not close enough to real data).

To sum it up, it's important to define loss of D that way because we do want D to try and reduce this loss but the ultimate goal of the whole G-D system is to have losses balance out. Hence if one loss goes to zero, it's failure mode (no more learning happens).

ghost avatar Apr 05 '18 15:04 ghost

Hence if one loss goes to zero, it's failure mode (no more learning happens).

I wouldn't say that no more learning happens. For instance: let's say that at the beginning, the discriminator's loss goes to 0. But then, the generator gets improved and in next iteration, the synthetic observations are good enough to fool the discriminator. So it's loss increases.

Generally, I would focus on the training process being stable. My understanding is that at the very beginning, the discriminator's accuracy should be high (say 90%), meaning that it separates fake observations from real ones well. Then, it's loss should steadily decrease as the generator improves.

The perfect (final) state is when you:

  • have 100% accuracy for the generator - meaning the discriminator classifies all synthetic observations as real;
  • have about 50% accuracy for the discriminator - meaning it cannot distingiush fake observations from real ones;
  • the synthetic observations are of good quality.

The last point however is another story.

mateuszkaleta avatar Jul 25 '18 07:07 mateuszkaleta

@mateuszkaleta AFAICT if discriminator loss goes to zero, there are no more loss gradients flowing (since these gradients are derivatives of loss), so weights of D and G are not modified, so the G cannot "get improved in next iteration" as you propose.

ghost avatar Jul 25 '18 08:07 ghost

What should I do to prevent a failure mode? Does anyone have any suggestions? Thanks!

sunbau avatar Dec 23 '18 20:12 sunbau

When I trained a DCGAN on celebrity face dataset, my discriminator quickly converged to zero loss and no more learning happened. But I was able to solve this problem for my case.

The error was that I was using a sigmoid layer at the discriminator output and using binary cross entropy (BCE) loss on this output. Instead, when I didn't use the sigmoid layer and directly wrote BCE on logits, it worked like a charm.

This is a well-known problem of instability when dealing with exponentials and logarithms. Essentially, very high positive values for logits were approximated to 1 and very low negative values to 0, which doesn't happen when I directly use logits because it uses the log-sum-exp trick.

It's also my understanding that the loss can never really go to zero since logits can't be possibly -inf or +inf. So there must be some approximation when you're getting zero loss.

KrnTneja avatar May 16 '19 08:05 KrnTneja

@KrnTneja : Thanks for your tricks. Could you provide any code to do it? I also meet the problem of loss D goes to zero

John1231983 avatar May 16 '19 15:05 John1231983

@KrnTneja : Thanks for your tricks. Could you provide any code to do it? I also meet the problem of loss D goes to zero

There isn't really any code to show. Just ensure that last layer of your discriminator is not a sigmoid layer i.e. output shouldn't be constrained to [0,1]. I was using PyTorch, where I had to use torch.nn.BCEWithLogitsLoss instead of torch.nn.BCELoss.

KrnTneja avatar May 20 '19 08:05 KrnTneja

hey, have you found any solution of this because I am having the same condition due to which I am not getting any generated image

arpita739 avatar May 13 '20 15:05 arpita739

Discriminator loss is 0 means the discriminator easily finding the images by the generator. It may happen in some cases like generator leaving checkerboard effects.

moulicm111 avatar Jun 07 '20 10:06 moulicm111

This may also occur total generator loss is sum two losses and generator is trying to minimize the other loss because weighing factor for it is more.

moulicm111 avatar Jun 07 '20 11:06 moulicm111

I face the same problem when I am training a Cyclegan, with torch.sigmoid(D(fake_img)) and GANloss: BCELoss() and finally My G fell into mode failure... Now I try BCELossWithLogits() see what is going on. hope it will work and thank you @KrnTneja !

DISAPPEARED13 avatar Oct 02 '21 07:10 DISAPPEARED13

@KrnTneja : Thanks for your tricks. Could you provide any code to do it? I also meet the problem of loss D goes to zero

There isn't really any code to show. Just ensure that last layer of your discriminator is not a sigmoid layer i.e. output shouldn't be constrained to [0,1]. I was using PyTorch, where I had to use torch.nn.BCEWithLogitsLoss instead of torch.nn.BCELoss.

What's the difference between combining BCEWithLogitsLoss with logits outputs and combining BCELoss with sigmoid outputs???

6xw avatar Mar 14 '22 13:03 6xw

@6xw When using BCEWithLogitsLoss, you can utilize the log-sum-exp trick to prevent overflow and thus increase numerical stability.

Eliacus avatar Apr 21 '22 09:04 Eliacus

@KrnTneja @mateuszkaleta

I have commented the Sigmoid layer in the discriminator and used BCEwithLogitsLoss and the Adam optimizer with a learning rate =0.0001. But still the discriminator loss reaches zero after 30 epochs. Is there anyway to fix that?

Pravin770 avatar Aug 04 '22 15:08 Pravin770

@KrnTneja @mateuszkaleta

I have commented the Sigmoid layer in the discriminator and used BCEwithLogitsLoss and the Adam optimizer with a learning rate =0.0001. But still the discriminator loss reaches zero after 30 epochs. Is there anyway to fix that?

Could you find any solution?

Raha304 avatar Oct 31 '22 12:10 Raha304