stylegan2-ada Unable to Train with FID metrics Enabled

Unable to Train with FID metrics Enabled

Open botty-mc-bot-face opened this issue 3 years ago • 3 comments

When training, the code hangs while calculating the FID on real images after epoch 0.

I let the thing run overnight on without any progress. Still seeing 12% CPU utilization in the morning, GPU memory full, no GPU utilization.

Training proceeds with the "--metrics none" flag added to the command line.

The dataset is around 88k images, but I do not see this behavior on the vanilla NVIDIA code.

Your metric_base.py seems identical to NVIDIA's code as does how you invoke it in train.py.

I wonder if the "top_k" modifications to dnnlib might have something to do with this behavior? I see that while metric_base.py is unmodified, it does import dnnlib.

Dec 12 '20 16:12 botty-mc-bot-face

top_k should only affect models if you use a specific config, but I’ll try to take a look this week

You might want to give pbaylies fork a try with his FID-10k spec. FID tends to max out GPUs more than training without it.

Dec 21 '20 08:12 dvschultz

Thanks! I'd be curious to try some of the additional training features you've implemented.

I've been focusing my attention on refining my datasets and how I pre-process them, still using vanilla Stylegan2-ADA. Found I can train on my 1080 ti I bought for the purpose with ADA at 1024x1024, whereas I needed config-e for plain old StyleGAN2 above 512.

You're implementations of the various interpolation methods are wonderfully easy to use, thanks!

Dec 22 '20 01:12 botty-mc-bot-face

sure thing. I tend to work with more abstract things where FID isn’t as important. You can always train in Peter’s fork and then use my repo for interpolations. not ideal but it works.

Jan 02 '21 16:01 dvschultz

stylegan2-ada stylegan2-ada copied to clipboard

Unable to Train with FID metrics Enabled

stylegan2-ada
stylegan2-ada copied to clipboard