accelerate icon indicating copy to clipboard operation
accelerate copied to clipboard

Accelerate 0.31.0 gradient accumulation bug.

Open nikitabalabin opened this issue 8 months ago • 1 comments

System Info

I have updated to accelerate 0.31.0 from 0.30.0 and all my trainings with gradient_accumulation_steps > 1 started to collapse. Please double check that everything is ok.

Reproduction

mixed_precision='fp16' gradient_accumulation_steps > 1

Expected behavior

the training should be stable with both gradient_accumulation_steps = 1 and gradient_accumulation_steps > 1

nikitabalabin avatar Jun 17 '24 21:06 nikitabalabin