nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

Add Linear batch_size Scheduler

Open erikqu opened this issue 2 years ago • 4 comments

What Changed

  • adds to configs option to train configs
  • linearly increases the training batch_size until max batch_size is reached.

Testing

  • verified training works with/without batch_size scheduler

Notes

  • closes one of the TODOs mentioned in the readme.

erikqu avatar Feb 18 '23 02:02 erikqu

I would add a param to control the starting point for the batch size instead of hardcoding it to 4

chrisociepa avatar Feb 20 '23 19:02 chrisociepa

@erikqu Your solution is small and simple (which is awesome), but it increases batch_size every third step from some starting value (currently it's 4, but I think you will make it as a hyperparameter like @chrisociepa suggested). When I am thinking about linear increase of something I expect that the value will be increasing during the specified range. For instance, by default the max_iters value is 600k. I am expecting that during this number of iterations the batch_size will be linearly increasing (or maybe not to the end, but to the middle of training, in this case 300k and then stays at the maximum level).

As I see from config files the batch_size is quite small so with your approach batch_size will cover path from 4 till 20..30 rather quickly and, I suppose, the effect might be negligible.

Andrei-Aksionov avatar Feb 21 '23 18:02 Andrei-Aksionov

Thanks for the feedback y'all, will update this shortly.

erikqu avatar Feb 23 '23 03:02 erikqu

I would love to see the same change for the gradient_accumulation_steps parameter since it is related to the batch size.

chrisociepa avatar Feb 24 '23 21:02 chrisociepa