stylegan2-pytorch icon indicating copy to clipboard operation
stylegan2-pytorch copied to clipboard

Mapping network mode collapse.

Open GLivshits opened this issue 2 years ago • 3 comments

Hi. This is not an issue but a topic for discussion. I've found out that on some domains with rich and frequent patterns there is a mode collapse of a mapping network: the generator actually uses the additive noise to generate images, but not the latents - the variation of W coordinates along the batch axis is around 1e-5. Also, disabling noise injection at 4x4 fmap is ok, but when I disable 8x8 - there is a full mode collapse. Is it all right that noise injection is SO important for the generation?

GLivshits avatar Sep 27 '21 07:09 GLivshits

Hmm I think the impact of noise injection is not very large if it is trained smoothly. I think model could be trained so that noise injection have crucial role for the generation. It may because it could be better to have spatial dimension in the latent codes. (You can may refer to https://github.com/naver-ai/StyleMapGAN) Or it may be due to slow training rate of mapping network. You can increase learning rate of mapping network and reduce the number of layers.

rosinality avatar Sep 27 '21 12:09 rosinality

I've used 0.1 lr multiplier for mapping. Note that this is not valid for all domains. But nevertheless, the more I disable noise - the more uniform images are. My particular domain is full of sharp details (256x256 image size) and high frequency domain. Maybe Stylegan is not capable of generating such images just using style vector.

GLivshits avatar Sep 28 '21 08:09 GLivshits

I think I've found an issue. In your code (and in original code of Nvidia) the blur kernel [1,3,3,1] is used. First of all, I don't see any reason for applying blur at resolutions below 16, because frequency domain changes A LOT. Secondly, such blurring destroys all high frequency components. If the target domain contains high frequencies, the only way for generator to fool a discriminator is to utilize noise injection at fine levels (for me, its 128). If I change noise at this level - the image changes completely. So one should be very careful about blurring + use the odd dimension of filter, because the even number leads to just mean calculation of 4 central elements. Such problem is quite important too in Alias-Free GAN: filter parameters should be optimal for a particular domain.

GLivshits avatar Oct 08 '21 12:10 GLivshits