lightweight-gan
lightweight-gan copied to clipboard
loss implementation differs from paper
Hi,
Thanks for this amazing implementation! I have a question concerning the loss implementation, as it seems to differ from the original equations. The screenshot below shows the GAN loss as presented in the paper :
- in red, the discriminator loss (D loss) on the true labels,
- in green the D loss on labels for fake generated images,
- and in blue, the generator loss (G loss) on labels for fake images.
This makes sense to me. Since it is assumed that D outputs values between 0 and 1 (0 = fake, 1 = real) :
- in red, we want D to output 1 for true images → let's assume D indeed outputs 1 for true images : -min(0, -1 + D(x)) = 0, which is indeed the minimum achievable,
- in green, we want D to output 0 (from the discriminator perspective) for fake images → let's assume D indeed outputs 0 for fake images : -min(0, -1 - D(x^)) = 1, which is the minimum achievable if D outputs values only between 0 and 1,
- in blue, we want D to output 1 (from the generator perspective) for fake images : the equation follows directly.
Now, the way the authors implement this in the code provided in the supplementary materials of the paper is as follows (the colors match the ones in the above picture)
Except for the strange involved randomness (already explained in https://github.com/lucidrains/lightweight-gan/issues/11), their implementation is a one to one match with the paper equations.
The way it is implemented in this repo however is quite different, and I do not understand why..
Let's start with the discriminator loss :
- in red, you want D to output small values (negative if allowed), to set this term as small as possible (0 if D can output negative values)
- in green, you want D to output values as large as possible (larger or equal to 1) to cancel this term out as well
For the generator loss :
- in blue, you want the opposite of green, that is for D to output values as small as possible
This implementation seems to be meaningful, and yields coherent results (as proven in examples). It also seems to me that D is not limited to output values between 0 and 1, but any real value (I might be wrong). I am just wondering why this choice? Could you perhaps elaborate why you decided to implement the loss differently from the original paper?
I think it was just taken from some other article. You can see some elements of WGAN-GP in this code, such as simplified realization of gradient penalty. Also, this code contains multiple losses (user can use dual contrastive loss instead of hinge loss). It could be implemented in such a way as to be able to use one training loop code for several loss functions.