improved-diffusion
improved-diffusion copied to clipboard
no_sync() used when using ddp
I found this part quite confusing in train_util.py
if last_batch or not self.use_ddp:
losses = compute_losses()
else:
with self.ddp_model.no_sync():
losses = compute_losses()
Is this part supposed to be like this? When the model is using ddp, why do we need to stop sync across all gpus?