Floris Fok
Results
1
issues of
Floris Fok
If max_steps or data length is not divisible by gradient_accumulation_steps some gradients are lost. Since updating only takes place at `if (step + 1) % gradient_accumulation_steps == 0:`