stylegan2-pytorch Regarding gradient-accumulate-every

Regarding gradient-accumulate-every

Open RayeRTX opened this issue 3 years ago • 2 comments

Hello all!

Does the --gradient-accumulate-every value determines the the number of batches used to accumulate the gradient (without zeroing it) before the model weights are updated and gradients zeroed?

Also, in the README, it is suggested that --gradient-accumulate-every multiplied by --batch-size should be at least 32, but other examples in the README uses values like

--batch-size 1 --gradient-accumulate-every 16 in

$ stylegan2_pytorch --new --data /path/to/images --name my-project-name --image-size 512 --batch-size 1 --gradient-accumulate-every 16 --network-capacity 10

and --batch-size 3 --gradient-accumulate-every 5 in

$ stylegan2_pytorch --data /path/to/data \
    --batch-size 3 \
    --gradient-accumulate-every 5 \
    --network-capacity 16

Both of these have values of about 15 for batch-size multiplied by --gradient-accumulate-every. How do you decide to use these values much smaller than 32?

Thanks!

Oct 02 '20 14:10 RayeRTX

@RayeRTX Hi again! You actually want that value to be as large as possible. For large scale GAN training (BigGAN), people aim for batch sizes 256 or beyond! However, I want people to get a taste of disentanglement within the day, and I found the minimal size was around 16 with still ok results. In reality, you should aim for at least 32!

Oct 02 '20 19:10 lucidrains

@RayeRTX what are you training on? care to share your results? :) I relish seeing what others have trained

Oct 02 '20 19:10 lucidrains

stylegan2-pytorch stylegan2-pytorch copied to clipboard

Regarding gradient-accumulate-every

stylegan2-pytorch
stylegan2-pytorch copied to clipboard