stylegan2encoder
stylegan2encoder copied to clipboard
Stylegan2 Training GPU usage maximization
I came to post this question here as the NVlab stylegan and stylegan2 projects provide minimal instruction about training and don't allow creating issues. In the README, the author listed the expected run time given the number of GPU (1,2,4 or 8), and the number of images for certain resolution. (DGX-1 box has 8 Tesla V100 and 32GB each).
Configuration | Resolution | Total kimg | 1 GPU | 2 GPUs | 4 GPUs | 8 GPUs | GPU mem |
---|---|---|---|---|---|---|---|
config-f | 1024×1024 | 25000 | 69d 23h | 36d 4h | 18d 14h | 9d 18h | 13.3 GB |
config-f | 1024×1024 | 10000 | 27d 23h | 14d 11h | 7d 10h | 3d 22h | 13.3 GB |
config-e | 1024×1024 | 25000 | 35d 11h | 18d 15h | 9d 15h | 5d 6h | 8.6 GB |
config-e | 1024×1024 | 10000 | 14d 4h | 7d 11h | 3d 20h | 2d 3h | 8.6 GB |
config-f | 256×256 | 25000 | 32d 13h | 16d 23h | 8d 21h | 4d 18h | 6.4 GB |
config-f | 256×256 | 10000 | 13d 0h | 6d 19h | 3d 13h | 1d 22h | 6.4 GB |
Question: Is there a way to tune the parameters so that the GPU usage is fully maximized given the running host tech spec? If there isn't a magic flag like that, what are the key parameters that I should dial up or down given my training host technical specification?
On one extreme: As each Tesla GPU has 32GB memory and the training only uses 6.4GB out of 32GB, at the same time, for DGX2 they have doubled the GPU counts to be 16 instead of 8, then the usage is only 8 * 6.4 / (16 * 32) = 10%. If we can tweak something like the minibatch size or something else, does that mean we can cut the training time from 13 days to 2 days?
On the other extreme: I might only have two small gaming GPUs that each has 6GB GPU memory, then it might require a different batch size which all benchmarks above require a memory usage greater than 6GB.
By looking at the stylegan2 run_trainning.py, the closest parameter that I found is --total-kimg
and -num-gpus
, maybe --config
too.
parser.add_argument('--num-gpus', help='Number of GPUs (default: %(default)s)', default=1, type=int, metavar='N')
parser.add_argument('--total-kimg', help='Training length in thousands of images (default: %(default)s)', metavar='KIMG', default=25000, type=int)
But --total-kimg
feels like the total number of the tfrecords you want for training the length rather than width.
By looking into training_loop.py, there are another 50 parameters like minibatch_size_base=32
and minibatch_gpu_base=4
and others which I believe directly impacted the throughput of traning which I don't fully understand which knob should I turn.
Thought?