improved-diffusion no_sync() used when using ddp

no_sync() used when using ddp

Open NothingIknow opened this issue 1 year ago • 0 comments

I found this part quite confusing in train_util.py

if last_batch or not self.use_ddp:
        losses = compute_losses()
    else:
        with self.ddp_model.no_sync():
            losses = compute_losses()

Is this part supposed to be like this? When the model is using ddp, why do we need to stop sync across all gpus?

Apr 15 '24 08:04 NothingIknow

improved-diffusion improved-diffusion copied to clipboard

no_sync() used when using ddp

improved-diffusion
improved-diffusion copied to clipboard