stylegan2-pytorch
stylegan2-pytorch copied to clipboard
Regarding gradient-accumulate-every
Hello all!
Does the --gradient-accumulate-every
value determines the the number of batches used to accumulate the gradient (without zeroing it) before the model weights are updated and gradients zeroed?
Also, in the README, it is suggested that --gradient-accumulate-every
multiplied by --batch-size
should be at least 32, but other examples in the README uses values like
--batch-size 1 --gradient-accumulate-every 16
in
$ stylegan2_pytorch --new --data /path/to/images --name my-project-name --image-size 512 --batch-size 1 --gradient-accumulate-every 16 --network-capacity 10
and --batch-size 3 --gradient-accumulate-every 5
in
$ stylegan2_pytorch --data /path/to/data \
--batch-size 3 \
--gradient-accumulate-every 5 \
--network-capacity 16
Both of these have values of about 15 for batch-size
multiplied by --gradient-accumulate-every
. How do you decide to use these values much smaller than 32?
Thanks!
@RayeRTX Hi again! You actually want that value to be as large as possible. For large scale GAN training (BigGAN), people aim for batch sizes 256 or beyond! However, I want people to get a taste of disentanglement within the day, and I found the minimal size was around 16 with still ok results. In reality, you should aim for at least 32!
@RayeRTX what are you training on? care to share your results? :) I relish seeing what others have trained