PyTorch-StudioGAN Speeding up ReACGAN training

Speeding up ReACGAN training

Open festinais opened this issue 3 years ago • 1 comments

Hi,

I'm training ReACGAN with train dataset size being: 28003, img_size:128, and 4 gpus. It takes 10 hours for 10% of training to be done. Are there any configurations that I can use to speed up the training? Or, is it expected to be this way? I couldn't find any time measurements.

I see inside cfgs.RUN there are some parameters. For example, I was thinking to use mixed_precision and see. However, I would appreciate if you also have any other ideas that I can check to fasten up the training.

Thank you!

Apr 06 '22 08:04 festinais

Actually, training GAN would try the patience of job.

I trained BigGAN and ReACGAN on ImageNet for a month, respectively (please refer to the appendix H of ReACGAN paper).

Training times are measured without the mixed precision, so you can accelerate ReACGAN using -mpc option if your GPU supports the mixed precision training.

Also, you can use -DDP option to accelerate the data parallel training.

Lastly, loading up all the images with the main memory of your computing machine can also help speed up your training. Please add -hdf5 and -l options together. One noticing point is that applying -hdf5 and -l simultaneously is not compatiable with -DDP training. So if you want to train your model using -DDP, turn on -hdf5 only.

Best,

Minguk

Apr 06 '22 09:04 mingukkang

PyTorch-StudioGAN PyTorch-StudioGAN copied to clipboard

Speeding up ReACGAN training

PyTorch-StudioGAN
PyTorch-StudioGAN copied to clipboard