pytorch-CycleGAN-and-pix2pix WGAN/WGAN-GP in CycleGAN model

hello, I noticed that you add WGAN-GP loss in CycleGAN.

I am wondering that if the generator will oscillating during training using wgan loss or wgan-gp loss instead of lsgan loss because the wgan loss might be negative value.

I replaced the lsgan loss with wgan/wgan-gp loss (the rest of parameters and model structures were same) for horse2zebra transfer mission and I found that the model using wgan/wgan-gp loss can not be trained:

The Wasserstein distance was very small number (about 1e-4) in the beginning of training, is that normal? Because the WD value was large (about 1e-1~1e0) when I training the original WGAN in noise2picture mission.
The discriminator/generator losses were highly oscillating and cannot see any sights of Wasserstein distance is decreasing, I tried to adjust the learning rate, but it doesn't work. Can you give me some advice?
I used Keras so I set the labels of real images and generated images as 1 and 0 for lsgan, respectively. and -1 and 1 for wgan. wgan loss defined as K.mean(y_true * y_pred). Will this setting leads bad results? because I found that the accuracy of discriminator is nearly 0% when I using wgan/wgan-gp loss (30%~90%+ when using lsgan loss).

One more thing, the generator losses is wgan + cyc, and I am wondering that if the negative wgan loss value makes the generator confused? I think when the wgan loss is negative, no matter the cyc loss getting larger or smaller compare with previous training step, we can also have chance to get a smaller loss.

Am I misunderstand something? please correct me if I am wrong. Thank you for your time!

Jul 25 '20 13:07 Kaede93

The WGAN loss itself doesn't work without GP. Even with GP, we also haven't made it work better than the vanilla CycleGAN/pix2pix. The loss is also not very stable and meaningful for us. The WGAN-GP loss was added to the repo in case users want to use it for other models. The possible reasons could be two: (1) the PatchGAN discriminator is already quite weak. Thus, adding GP loss will make it too weak compared to the generator. (2) the GP loss assumes that the inputs are independent according to the original paper, while PatchGAN takes overlapping patches, and breaks this assumption.

Jul 25 '20 23:07 junyanz

Thank you for your reply and I agree with your ideas. And I think the batch size is one of the factor that makes cyclegan-with-wgan-gp training very unstable, do you agree with that? I'm trying to set the batch size as 64 instead of 1 (use InstanceNorm), is this worthy to try?

I also wondering that if you have any reasonable images using WGAN-GP loss? (Even though the training was not very stable) If yes, how many training steps it takes? Because I trained CycleGAN with WGAN/WGAN-GP/WGAN-DIV loss, thier results were very bad (jusy a noise map, or sometimes looks like "ghost horses"). It seems the discriminators were too weak to give any useful guidiance to the generators.

And I realized that the WGAN (including GP and DIV version) loss are very sensitive to network structures and input size. It seems not very easy to apply the WGAN loss to other models.

Jul 26 '20 05:07 Kaede93

Unfortunately, we don't have reasonable images with WGAN-GP. I am not sure if it is related to batch_size. As I mentioned, GP seems to be not compatible with PatchGAN. The loss might work for other types of discriminators and tasks.

Jul 28 '20 03:07 junyanz

thank you for your reply again.

I tried the 64 of batch size, it failed to achieve any reasonable results yet (just noise map. Even thought the Wasserstein distance is more stable than batch size of 1, and it's decreasing). The discriminator seems still too weak to feedback any useful gradients to generator, so the results might not correct in this experiment.

And I also replaced the patchGAN with DCGAN's, but the model cannot be trained too.

Is that why you use LSGAN loss instead of WGAN loss in the paper?

Thank you for your time, have a nice day

Jul 28 '20 04:07 Kaede93

Yes, we found that LSGAN is more effective in our paper.

Jul 29 '20 13:07 junyanz

Sorry for the late reply.

You mentioned that you got better results by using the resnet rather unet architecture in other Issue. So what's the better means? In my experiments (using horse2zebra dataset), I got better style transfer results with resnet but the better image quality results with unet.

I am wondering if I should build the network with unet when I want to keep details of background as much as possible (especially the color information), or can you give me some tips? I also added the SSIM and perceptual loss to the loss function, but it seems the reconstruction of the background color still not very good.

Aug 06 '20 00:08 Kaede93

What is the difference between better style transfer results vs. better image quality results? The background color is supposed to change as the color distribution in horse and zebra backgrounds are different. You can use an object mask if your goal is to keep the background color. See this nice work for an example.

Aug 06 '20 05:08 junyanz

Good Share

Nov 02 '21 08:11 qsunyuan

thanks for sharing 👍

Jun 23 '22 08:06 joonas-yoon

Hello,

I am currently trying to train WGAN-GP with Patch-Discriminator. But it is somehow impossible to make it work. I also tried not-overlapping convents, but it didn't help much. Did someone find a way to train a good model? Or have any Ideas-why WGAN doesn't work with Patch-Discriminator?

Jun 28 '22 19:06 bgjeroska

@bgjeroska

I think this comment is for you. you can see the comment above

The possible reasons could be two: (1) the PatchGAN discriminator is already quite weak. Thus, adding GP loss will make it too weak compared to the generator. (2) the GP loss assumes that the inputs are independent according to the original paper, while PatchGAN takes overlapping patches, and breaks this assumption.

Jun 29 '22 13:06 joonas-yoon

Hi, there. I am sticking for a more stable training also strengthen D for clearer synthesis. Would it work for just add GP in PatchGAN? Thanks.

Nov 25 '22 14:11 DISAPPEARED13