litgpt Fix Warmup and LR Schedule

There is a confusion in the train code between iter_nums (micro batch) and step_count (batch).

The meta parameters, intervals or warmup_iter, refers either to the first one (like log_interval, max_iters) either to the second one (like eval_interval, save_interval, and warmup_iter).

There is a gradient_accumulation_iters ratio between them.

This PR fixes the warmup_iter (in batch units, not micro batch) so that it effectively reaches the constant state. I also reduced the target LR from 9e-3 to 3e-3: with previous settings, at the end of 5 epochs we were still under warmup regime and never reached 9e-3 but close to 3e-3.

A distinctive naming for these confusing variables would help.

Jun 10 '23 19:06 AngainorDev

@carmocca Is this of interest? I would have some related suggests and PR, but I prefered to keep this one as small as possible for a start.

Jun 15 '23 07:06 AngainorDev

Thank you @AngainorDev.

Usually we refer to iter as something that goes with the micro batch (aka with every backward), while we use step for something that goes with the batch, (from optimizer step).

I agree that warmup_iter is confusing here, because we don't apply the schedule at ever iter, but at every step. And in fact we compare

if step_count <= warmup_iters:

so I agree it should be called warmup_steps.

@carmocca I propose we get the change in and rename to warmup_steps throughout the codebase.

Jun 20 '23 09:06 lantiga

This sounds good to me too. Sorry for the confusion!

@AngainorDev Would you like to update all occurrences together here?

Jun 30 '23 02:06 carmocca

@carmocca Sure, just done!

Jun 30 '23 12:06 AngainorDev

All the finetune/ and pretrain/ scripts should be updated too

Jun 30 '23 12:06 carmocca

Sorry! I'll process them as well asap.

Jun 30 '23 12:06 AngainorDev

Done for all finetune/ scripts.

Both scripts in pretrain/ use a different loop and rely on iters, with no explicit steps.

Jun 30 '23 13:06 AngainorDev