stylegan2-pytorch icon indicating copy to clipboard operation
stylegan2-pytorch copied to clipboard

Regarding gradient-accumulate-every

Open RayeRTX opened this issue 3 years ago • 2 comments

Hello all!

Does the --gradient-accumulate-every value determines the the number of batches used to accumulate the gradient (without zeroing it) before the model weights are updated and gradients zeroed?

Also, in the README, it is suggested that --gradient-accumulate-every multiplied by --batch-size should be at least 32, but other examples in the README uses values like

--batch-size 1 --gradient-accumulate-every 16 in

$ stylegan2_pytorch --new --data /path/to/images --name my-project-name --image-size 512 --batch-size 1 --gradient-accumulate-every 16 --network-capacity 10

and --batch-size 3 --gradient-accumulate-every 5 in

$ stylegan2_pytorch --data /path/to/data \
    --batch-size 3 \
    --gradient-accumulate-every 5 \
    --network-capacity 16

Both of these have values of about 15 for batch-size multiplied by --gradient-accumulate-every. How do you decide to use these values much smaller than 32?

Thanks!

RayeRTX avatar Oct 02 '20 14:10 RayeRTX

@RayeRTX Hi again! You actually want that value to be as large as possible. For large scale GAN training (BigGAN), people aim for batch sizes 256 or beyond! However, I want people to get a taste of disentanglement within the day, and I found the minimal size was around 16 with still ok results. In reality, you should aim for at least 32!

lucidrains avatar Oct 02 '20 19:10 lucidrains

@RayeRTX what are you training on? care to share your results? :) I relish seeing what others have trained

lucidrains avatar Oct 02 '20 19:10 lucidrains