Floris Fok

Results 1 issues of Floris Fok

If max_steps or data length is not divisible by gradient_accumulation_steps some gradients are lost. Since updating only takes place at `if (step + 1) % gradient_accumulation_steps == 0:`