accelerate
accelerate copied to clipboard
Accelerate 0.31.0 gradient accumulation bug.
System Info
I have updated to accelerate 0.31.0 from 0.30.0 and all my trainings with gradient_accumulation_steps > 1 started to collapse. Please double check that everything is ok.
Reproduction
mixed_precision='fp16' gradient_accumulation_steps > 1
Expected behavior
the training should be stable with both gradient_accumulation_steps = 1 and gradient_accumulation_steps > 1