WassersteinGAN icon indicating copy to clipboard operation
WassersteinGAN copied to clipboard

Inconsistent loss function from the paper?

Open realcrane opened this issue 6 years ago • 8 comments

Hi,

I don't use torch a lot and I have a question regarding the implementation of the discriminator loss at line 211, errD = errD_real - errD_fake where errD_real is the gradient of real samples (line202: errD_real.backward(one)) and errD_fake is the gradient of fake samples (errD_fake.backward(mone)). However, in the paper it seems that errD needs to be maximized while it is minimized here?

Thanks

realcrane avatar Feb 13 '19 11:02 realcrane

I have the same question as you, but as I guess that because the code change the sign of both G and D loss, the discriminator just instead gives high score for fake images and low score for real images.

Intuitively, you are just changing the "label" of real and fake images.

I dont know if this is correct. Can anyone confirm this?

I found another implementation that does exactly stated in the paper: link

npmhung avatar Feb 16 '19 08:02 npmhung

@npmhung , thanks for the link. I had a look at it. But there are also some other questions regarding this one. For instance, it seems that there Sigmoid has been used as the last layer for both D and G whereas in the original paper it kinda suggests that it is not? The way I understand WGAN is the clipping constrains the values, not some activation such as Sigmoid. There might be numerical consequences if activation is used but it is unclear to me either.

realcrane avatar Feb 20 '19 18:02 realcrane

Can you specify the line that they use sigmoid? It seems that I can't find one.

npmhung avatar Feb 23 '19 18:02 npmhung

I don't think the use of sigmoid is necessary (at least for the critic net, since this is only to output the score). If sigmoid is used for D, it'll even make training slower since gradient will saturate when D is more and more correct. For G, one can use both sigmoid or tanh to output the generated samples, but tanh is better for learning in my opinion. Also, the loss functions for both D and G are the reverses of what is discussed in the paper. However, since they reverse both losses, it kinda turns out to be correct (as explained in the label flipping explanation above).

khoadoan avatar May 05 '20 17:05 khoadoan

after debug a lot I found this code is not wrong, but confusing

                errD_real = netD(inputv)
                errD_real.backward(one)

                # train with fake
                noise.resize_(opt.batchSize, nz, 1, 1).normal_(0, 1)
                noisev = Variable(noise, volatile = True) # totally freeze netG
                fake = Variable(netG(noisev).data)
                inputv = fake
                errD_fake = netD(inputv)
                errD_fake.backward(mone)
                errD = errD_real - errD_fake

notice that mone here is defined as -1*one

In fact, the loss in the paper is: 图片

so, just do backward() on the loss_d and loss_g is more easy to understand:

#for D
loss_d = errD_real.mean() - errD_fake.mean()
loss_d.backward()
#for G
loss_g = - errD_fake.mean()
loss_g.backward()

ALLinLLM avatar Nov 28 '20 10:11 ALLinLLM

after debug a lot I found this code is not wrong, but confusing

                errD_real = netD(inputv)
                errD_real.backward(one)

                # train with fake
                noise.resize_(opt.batchSize, nz, 1, 1).normal_(0, 1)
                noisev = Variable(noise, volatile = True) # totally freeze netG
                fake = Variable(netG(noisev).data)
                inputv = fake
                errD_fake = netD(inputv)
                errD_fake.backward(mone)
                errD = errD_real - errD_fake

notice that mone here is defined as -1*one

In fact, the loss in the paper is: 图片

so, just do backward() on the loss_d and loss_g is more easy to understand:

#for D
loss_d = errD_real.mean() - errD_fake.mean()
loss_d.backward()
#for G
loss_g = - errD_fake.mean()
loss_g.backward()

I think the loss for G should be:

#for G
loss_g = errD_fake.mean()
loss_g.backward()

as the sign changed in line 6 and 11 in the paper.

zzachw avatar Dec 24 '20 15:12 zzachw

Does it mean the loss is actually has nothing to do with the label? Like -1 for a fake image.

SuperbTUM avatar Dec 08 '21 15:12 SuperbTUM

after debug a lot I found this code is not wrong, but confusing

                errD_real = netD(inputv)
                errD_real.backward(one)

                # train with fake
                noise.resize_(opt.batchSize, nz, 1, 1).normal_(0, 1)
                noisev = Variable(noise, volatile = True) # totally freeze netG
                fake = Variable(netG(noisev).data)
                inputv = fake
                errD_fake = netD(inputv)
                errD_fake.backward(mone)
                errD = errD_real - errD_fake

notice that mone here is defined as -1*one In fact, the loss in the paper is: 图片 so, just do backward() on the loss_d and loss_g is more easy to understand:

#for D
loss_d = errD_real.mean() - errD_fake.mean()
loss_d.backward()
#for G
loss_g = - errD_fake.mean()
loss_g.backward()

I think the loss for G should be:

#for G
loss_g = errD_fake.mean()
loss_g.backward()

as the sign changed in line 6 and 11 in the paper.

Have you tried to modify the code in such a way? For me, this is incorrect because it will cause some problems in gradient propagation for the discriminator. I think this is due to the definition of activation functions since they are set as inplace=True. One possible solution to this is to set them to False, but I haven't tried yet.

SuperbTUM avatar Dec 08 '21 15:12 SuperbTUM