apex no_sync equivalent used for gradient accumulation

no_sync equivalent used for gradient accumulation

Open amsword opened this issue 3 years ago • 2 comments

In gradient accumulation, we do not need to gather the gradient for the first N - 1 iterations. If it is pytorch/DDP, we can use the no_sync() as follows. In apex DDP, is there any equivalent?

https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel.no_sync

Sep 24 '21 20:09 amsword

@amsword Have you solved this problem？

Dec 06 '21 07:12 yushanyong

@amsword @601222543 I have the same problem, do you have any solution? Thanks.

Jan 13 '24 13:01 zhengyuan-xie

apex apex copied to clipboard

no_sync equivalent used for gradient accumulation

apex
apex copied to clipboard