denoising-diffusion-pytorch icon indicating copy to clipboard operation
denoising-diffusion-pytorch copied to clipboard

A question related to batch size and training speed

Open paidaxinbao opened this issue 1 year ago • 2 comments

Hi!

The code in this repository has helped me a lot!

I found that as the batch size increases, the training time increases dramatically. When I set the batch size to 4 (the dataset has 25k images) the training time is about 2 days, but when the batch size is set to 128, the training time increases to 800 hours!

I don't know much about this.

My training configuration is as follows: model = Unet( dim=64, out_dim=1, dim_mults=(1, 2, 4, 8), channels=2 )

diffusion = GaussianDiffusion( model, image_size=128, timesteps=1000, # number of steps sampling_timesteps=250, # number of sampling timesteps (using ddim for faster inference [see citation for ddim paper]) )

trainer = Trainer( diffusion, '/home/pxy/ML_work/train_picset/', train_batch_size=4, train_lr=8e-5, train_num_steps=700000, # total training steps gradient_accumulate_every=4, # gradient accumulation steps ema_decay=0.995, # exponential moving average decay amp=True, # turn on mixed precision calculate_fid = False )

trainer.train()

paidaxinbao avatar May 27 '24 02:05 paidaxinbao