Fix Warmup and LR Schedule
There is a confusion in the train code between iter_nums (micro batch) and step_count (batch).
The meta parameters, intervals or warmup_iter, refers either to the first one (like log_interval, max_iters) either to the second one (like eval_interval, save_interval, and warmup_iter).
There is a gradient_accumulation_iters ratio between them.
This PR fixes the warmup_iter (in batch units, not micro batch) so that it effectively reaches the constant state. I also reduced the target LR from 9e-3 to 3e-3: with previous settings, at the end of 5 epochs we were still under warmup regime and never reached 9e-3 but close to 3e-3.
A distinctive naming for these confusing variables would help.
@carmocca Is this of interest? I would have some related suggests and PR, but I prefered to keep this one as small as possible for a start.
Thank you @AngainorDev.
Usually we refer to iter as something that goes with the micro batch (aka with every backward), while we use step for something that goes with the batch, (from optimizer step).
I agree that warmup_iter is confusing here, because we don't apply the schedule at ever iter, but at every step.
And in fact we compare
if step_count <= warmup_iters:
so I agree it should be called warmup_steps.
@carmocca I propose we get the change in and rename to warmup_steps throughout the codebase.
This sounds good to me too. Sorry for the confusion!
@AngainorDev Would you like to update all occurrences together here?
@carmocca Sure, just done!
All the finetune/ and pretrain/ scripts should be updated too
Sorry! I'll process them as well asap.
Done for all finetune/ scripts.
Both scripts in pretrain/ use a different loop and rely on iters, with no explicit steps.