Swin-Transformer
Swin-Transformer copied to clipboard
gradient_accumulation should use “epoch * num_steps + idx” rather than “idx + 1”
if (idx + 1) % config.TRAIN.ACCUMULATION_STEPS == 0: optimizer.step() optimizer.zero_grad() lr_scheduler.step_update(epoch * num_steps + idx)