apex
apex copied to clipboard
no_sync equivalent used for gradient accumulation
In gradient accumulation, we do not need to gather the gradient for the first N - 1 iterations. If it is pytorch/DDP, we can use the no_sync() as follows. In apex DDP, is there any equivalent?
https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel.no_sync
@amsword Have you solved this problem?
@amsword @601222543 I have the same problem, do you have any solution? Thanks.