[trainer] fix bug in grad accum with multiple epochs
Please see https://github.com/huggingface/transformers/issues/22082 for the analysis printout of the problem.
But basically we have a bug in grad accum machinery when steps_in_epoch % gradient_accumulation_steps != 0
we always check for step+1 % gradient_accumulation_steps != 0 and when we hit the epoch boundary we end up running more than gradient_accumulation_steps in that iteration.
I proposed a fix using a total step counter - please feel free to suggest a different fix.
I left the debug prints if you'd like to validate the situation yourself. will remove when happy.
Fixes: https://github.com/huggingface/transformers/issues/22082
The documentation is not available anymore as the PR was closed or merged.