stylegan2-pytorch icon indicating copy to clipboard operation
stylegan2-pytorch copied to clipboard

Generating faces

Open oliverguhr opened this issue 5 years ago • 29 comments

Hello, I tried to train a model on 70k images of the FFHQ thumbnail dataset. The model should generate 128x128 images of faces, unfortunately, the results are not very convincing. I left all the parameters at there default value and trained over 500000 iterations. After 530000 iterations I stopped the training because the results started to decrease and the discriminator loss was 0 or close to 0.

Here are the results

What would be the best way to improve the results? -Train on high-resolution images -Use different training parameters -Use more images

oliverguhr avatar Mar 23 '20 09:03 oliverguhr

Hi Oliver,

Thanks for trying this replication. I made a small change to the data augmentation that may give you better results. https://github.com/lucidrains/stylegan2-pytorch/commit/f00ba1f50988ca1f7af3075223b0f404218d480a You should try it at a larger resolution if you can. The amount of data should be sufficient.

lucidrains avatar Mar 23 '20 11:03 lucidrains

Thank you very much! I downloaded the full resolution images and started the training with your updates. I post the results as soon as it's ready (+33h).

oliverguhr avatar Mar 23 '20 18:03 oliverguhr

The results are looking a bit strange. Here is the model output after 195 epoch: Left the model trained on Version 0.4.15 and the small images, right the model trained on version 0.4.16 with the full resolution images.

iter-195

Also, the model output didn't change much from epoch 50 to 195.

oliverguhr avatar Mar 24 '20 15:03 oliverguhr

@oliverguhr Indeed, I tried doing a couple runs on my own dataset (which took an entire day) and confirmed that it stopped learning - perhaps the relaxed data augmentation made it too easy for the discriminator. I brought back the random aspect ratios for now, but a bit closer to a ratio of 1, so it should be less distorted than before. I am doing another training run at the moment to verify that learning is restored. Sorry about that, and I will continue to look into this issue to see how I can fully remove the random aspect ratios.

lucidrains avatar Mar 26 '20 15:03 lucidrains

First of all, thank you for all the work you put in this! I played a bit with the parameters and started a new run with a batch size of 7 and network capacity of 24. This is the maximum that fits in my 11 GB VRAM. It's way slower and still running but the losses are much more stable and the result is looking better. Here is the result after 123k iterations. 123-ema

It trained the previous model with the default batch size of 3. I think that these small batches could be the reason why the model hat problems. I had problems in the past where small batches lead to unstable loss gradients. But since I also changed the network capacity I am not sure, which parameter lead to the improvement.

oliverguhr avatar Mar 27 '20 08:03 oliverguhr

@oliverguhr oh my, that looks great! I have made some further changes, and fully removed the ratio data augmentation on the newest version. Yes, the network capacity linearly corresponds with the number of parameters, and as you know, with deep learning, the bigger the better lol

I will need to look into setting some new defaults for batch size. I agree they are probably not as big as they should be.

lucidrains avatar Apr 01 '20 21:04 lucidrains

I trained the model a while longer (200k) iterations, with the best results at about 160k iterations

161-ema

However, after that, it got only worse and there are still some artefacts in it. Since I want to know which parameter leads to the improvement, I am currently running a second try with the default batch size of 3 and network capacity of 32. Will post some update on that tomorrow. And than retrain with your latest patches.

oliverguhr avatar Apr 02 '20 15:04 oliverguhr

For me, there was no noticeable difference in the results between a batch size of 3 and a network capacity of 32 and batch size 7 and network capacity of 24. But there is a difference with the newest 0.4.23 version. The model pics up the structure of the faces much quicker and produces better results with fewer iterations. Here is a preview of my current training results from 0 to 160000 iterations in 10000 iteration steps

mr-0-full

oliverguhr avatar Apr 09 '20 09:04 oliverguhr

@oliverguhr it is because, in the latest version, I introduce a hack that is used in BigGAN and StyleGAN, called truncation. What it does is it brings the intermediate style vector closer to its average, cutting out the outlier distributions. this results in general better image quality.

lucidrains avatar Apr 09 '20 16:04 lucidrains

hi, what does the argment 'fp16' mean? and how to use?

yuanlunxi avatar May 13 '20 03:05 yuanlunxi

could you share a pretained model for face?

yuanlunxi avatar May 14 '20 03:05 yuanlunxi

Hi @yuanlunxi here you can read more about FP16 I did not share my model, because the results are not perfect yet. I don't know what I can expect, but the results looked not as good as what Nvidia published.

oliverguhr avatar May 14 '20 09:05 oliverguhr

In the original implementation (as in https://github.com/nvlabs/stylegan2), the default is to train for 25,000 kimgs, or equivalently 25,000,000 iterations. I believe that this is due to the lack of training. After all, the paper claims to have trained on 8 V100s for as long as a week to yield superior results.

crrrr30 avatar Jun 15 '20 05:06 crrrr30

@oliverguhr Hi, if you have some better result please share here, and Which version of code did you try ? only 0.4.23 ?

Do you try with newst version ?? like 0.14.1 ?

Johnson-yue avatar Jun 17 '20 03:06 Johnson-yue

@Johnson-yue I started a new training run with the latest version of the code and it looks promising. I am using two attention layers and a resolution of 128x128.

This is a sample after 472,000 iterations. Way to go until 25 million iterations.

472-ema

Unfortunately, I was not able to start the training using FP16. Apex is running, but at some point, the script fails with a null exception.

oliverguhr avatar Sep 11 '20 08:09 oliverguhr

@oliverguhr good result!!

Johnson-yue avatar Sep 14 '20 02:09 Johnson-yue

I don't know what happed, but until iteration 682k the results got worse: 682-ema

one(!) iteration later the image looked like this:

683-ema

And after some more iterations, the images went completely dark.

@lucidrains Do you have any idea what happened here? I can provide the models and results if this helps.

oliverguhr avatar Sep 15 '20 11:09 oliverguhr

Could anyone provide us with a pre-trained PyTorch model? I assume most people won't bother training their own models and you'd also help save this planet by not allowing everybody to train a model for a week on 1313432 V100 GPUs.

gordicaleksa avatar Sep 30 '20 18:09 gordicaleksa

Sorry for the late response. Here is a list of trained models (and some sample results) that you can download:

.config.json

model_203.pt model_203.jpg model_300.pt model_300.jpg model_400.pt model_400.jpg model_500.pt model_500.jpg model_550.pt model_550.jpg model_600.pt model_600.jpg model_650.pt model_650.jpg model_700.pt model_700.jpg model_757.pt model_757.jpg

oliverguhr avatar Oct 22 '20 11:10 oliverguhr

@oliverguhr which commit were you using to train? I'm trying to load the model you provided but I'm not able to load it into the GAN. Missing some keys on loading the module... "..._blocks.1.1.fn.fn.2.weight", "D_aug.D.attn_blocks.1.1.fn.fn.2.bias", "D_aug.D.final_conv.weight", "D_aug.D.final_conv.bias", "D_aug.D.to_logit.weight", "D_aug.D.to_logit.bias" ..."

jomach avatar Nov 06 '20 07:11 jomach

@jomach Version 1.2.3 I wonder if this should be part of the config.json.

oliverguhr avatar Nov 06 '20 12:11 oliverguhr

I think this comes from saving only the dictionary instead of the full model...

@jomach Version 1.2.3 I wonder if this should be part of the config.json.

by bad. Never mind.

jomach avatar Nov 06 '20 14:11 jomach

@jomach Version 1.2.3 I wonder if this should be part of the config.json.

excuse me, what dataset are you using for training, I use ffhq, but the training is really too slow...

WoshiBoluo avatar Mar 24 '21 05:03 WoshiBoluo

excuse me, what dataset are you using for training, I use ffhq, but the training is really too slow...

It is in the first post.

Hello, I tried to train a model on 70k images of the FFHQ thumbnail dataset. The model should generate 128x128 images of faces,

woctezuma avatar Mar 24 '21 08:03 woctezuma

excuse me, what dataset are you using for training, I use ffhq, but the training is really too slow...

It is in the first post.

Hello, I tried to train a model on 70k images of the FFHQ thumbnail dataset. The model should generate 128x128 images of faces,

Are you using multiple GPUs? How long can I run to achieve a good result? I ran the 1000 pictures in ffhq on colab and it took 120 hours to iterate 150,000 times. Is this normal?

WoshiBoluo avatar Mar 24 '21 09:03 WoshiBoluo

You can find expected training times for StyleGAN2 here: https://github.com/NVlabs/stylegan2-ada-pytorch#expected-training-time

For 128x128 resolution, with only 1 GPU, you should expect 13 seconds per kimg of training. For full training with the recommended 25000 kimg, that is about 4 days of training (with 24h/day, which you cannot have on Colab).

Moreover, you won't have the same GPU every time on Colab. So if you end up with a bad one, that is more training time.

Finally, it is hard to judge your 150,000 iterations, because you don't mention the batch size, or the kimg/iteration. If you have parameters similar to the ones mentioned in this post, I guess you should have similar results: https://github.com/lucidrains/stylegan2-pytorch/issues/33#issuecomment-604885302

woctezuma avatar Mar 24 '21 09:03 woctezuma

excuse me, what dataset are you using for training, I use ffhq, but the training is really too slow...

It is in the first post.

Hello, I tried to train a model on 70k images of the FFHQ thumbnail dataset. The model should generate 128x128 images of faces,

Are you using multiple GPUs? How long can I run to achieve a good result? I ran the 1000 pictures in ffhq on colab and it took 120 hours to iterate 150,000 times. Is this normal?

Do you mean with 1000 pictures 1 kimgs? 1 kimg would be 1000 faked image iterations as I understood, is this true?

MationPlays avatar May 13 '22 09:05 MationPlays