style-based-gan-pytorch icon indicating copy to clipboard operation
style-based-gan-pytorch copied to clipboard

image size and the differences with official implementation

Open cientgu opened this issue 6 years ago • 20 comments

thanks for this work! But i still have two question:

  1. the number of styledconvblock is 6, so the output image size is no more than 128, am i right? can this extend to 512 image or even 1024 image?
  2. is there any differences between this implementation and the official code(or paper)?

cientgu avatar Feb 19 '19 06:02 cientgu

  1. You can add more layers and extend model to higher resolutions.
  2. I think I matched almost all details in the paper. I didn't checked all the details of implementations, but it looks like that both is almost similar. But some detail is slightly different - I used native bilinear interpolation, whereas official implementation uses binomial filter. And learning rates - this implementation uses 1e-3 (same as progressive gan paper.) and official implementation uses 1.5e-3.

rosinality avatar Feb 19 '19 08:02 rosinality

thanks! I will try higher resolution.

cientgu avatar Feb 19 '19 13:02 cientgu

I think I've found a difference with the official implementation.

In the StyledConvBlock, the noise is injected after to the AdaIN operation, whereas the official implementation does it just after the conv, before the AdaIN operation. Could this be the reason for the difference in results?

image

I'm trying to get the parameters from the official pretrained model (in TensorFlow) and put it in your network to see if I get the same results. I'll edit this point in my forked repository and get back here if I notice any more differences.

aparafita avatar Mar 04 '19 15:03 aparafita

It's my mistake. Thanks! Changed in 24896bb

rosinality avatar Mar 05 '19 01:03 rosinality

I think I found another difference too. In https://github.com/rosinality/style-based-gan-pytorch/blob/master/model.py#L266, you only apply to_rgb() when i == step, while in official implementation, they apply torgb in all blocks.

The same problem in Discriminator.

zhuhaozh avatar Mar 05 '19 09:03 zhuhaozh

Hmm, but wouldn't lerp_clip makes model ignore previous torgbs?

rosinality avatar Mar 06 '19 00:03 rosinality

Something I noticed was that here: https://github.com/rosinality/style-based-gan-pytorch/blob/master/model.py#L270

You are sending the upsampled activations through the previous steps toRGB because this line executes first: https://github.com/rosinality/style-based-gan-pytorch/blob/master/model.py#L259 and then you are interpolating.

Whereas in the official implementation, the activations of each step are run through the corresponding torgb layer and then the resulting output image is upsampled afterwards to do the interpolation https://github.com/NVlabs/stylegan/blob/master/training/networks_stylegan.py#L542

Was this intentional?

mileslefttogo avatar Mar 22 '19 00:03 mileslefttogo

Both will almost similar. But using torgbs before upsampling will be more efficient as it reduce channels first.

rosinality avatar Mar 22 '19 01:03 rosinality

@rosinality @aparafita guys, am I correct that this before/after changes do not require retrain? Feels like this impacts only inference

voa18105 avatar Mar 26 '19 07:03 voa18105

Unfortunately this will require retrain as noise term will interacts with adaptive instance norm.

rosinality avatar Mar 26 '19 08:03 rosinality

@voa18105 The function will be affected for sure. The AdaIN changes the scale of each channel, so if the noise comes before it, the scale of the noise is also affected. In that sense, the official implementation makes sense and the noise should be injected before the AdaIN, but it's hard to say how important it'd be to the overall result.

aparafita avatar Mar 26 '19 08:03 aparafita

oh no, 3 days retrain... again...

voa18105 avatar Mar 26 '19 08:03 voa18105

In the official implementation, they use blur after the upscale conv.

But this repo does not use the upscale conv when upscaling the image.

https://github.com/rosinality/style-based-gan-pytorch/blob/24896bb6c080e9c0fb233c7b3647422d65d73dc3/model.py#L258-L261

Did I miss something here?

zxch3n avatar Mar 28 '19 19:03 zxch3n

In the official implementation, they use blur after the upscale conv.

But this repo does not use the upscale conv when upscaling the image.

style-based-gan-pytorch/model.py

Lines 258 to 261 in 24896bb

if i > 0 and step > 0: upsample = F.interpolate(out, scale_factor=2, mode='bilinear', align_corners=False) # upsample = self.blur(upsample) out = conv(upsample, style_step, noise[i]) Did I miss something here?

bilinear upsampling is taking the place of the conv up + blur, since in pytorch the upscaling uses interpolate anyway the bilinear filtering on the way up is essentially the same as blur.

It is slightly different, but I changed it to be the exact same and it didn't make a noticeable difference qualitatively in FID score. The StyleGAN paper also mentions they tried bilinear upsampling and it made a small improvement, although I didn't see it in the code.

mileslefttogo avatar Mar 28 '19 23:03 mileslefttogo

@mileslefttogo what I don't understand here is why conv up layer can be replaced as well, as one is trainable while another is not.

zxch3n avatar Mar 29 '19 04:03 zxch3n

Official implementations use upscale -> conv -> blur, my implementation use upscale (bilinear) -> conv. So yes order is different. (upscale & blur works similarly to bilinear interpolation except of edges as @mileslefttogo said. I used bilinear interpolations due to speed problems.) I don't know it will make much differences. But maybe you can try to change ordering.

rosinality avatar Mar 29 '19 05:03 rosinality

Now I got it. Thanks

zxch3n avatar Mar 29 '19 06:03 zxch3n

@rosinality @aparafita @voa18105 Hi guys, do you guys get the new model for the fixed-bug version (commit 24896bb). I will appreciate if any of you can provide me a more advanced pre-trained model on ffhq. Further question, do you guys get a model on generating high-resolution images.

Cold-Winter avatar Apr 01 '19 03:04 Cold-Winter

@Cold-Winter as I understand, this implementation does not suppose HQ. Also, I dont have ffhq.

voa18105 avatar Apr 01 '19 08:04 voa18105

@Cold-Winter I don't know I can get enough computing resources to train high resolution model in reasonable time...But I will revise codes to allow train model in higher resolutions.

rosinality avatar Apr 01 '19 14:04 rosinality