nanoGPT
nanoGPT copied to clipboard
Add Linear batch_size Scheduler
What Changed
- adds to configs option to train configs
- linearly increases the training batch_size until max batch_size is reached.
Testing
- verified training works with/without batch_size scheduler
Notes
- closes one of the TODOs mentioned in the readme.
I would add a param to control the starting point for the batch size instead of hardcoding it to 4
@erikqu Your solution is small and simple (which is awesome), but it increases batch_size every third step from some starting value (currently it's 4, but I think you will make it as a hyperparameter like @chrisociepa suggested). When I am thinking about linear increase of something I expect that the value will be increasing during the specified range. For instance, by default the max_iters value is 600k. I am expecting that during this number of iterations the batch_size will be linearly increasing (or maybe not to the end, but to the middle of training, in this case 300k and then stays at the maximum level).
As I see from config files the batch_size is quite small so with your approach batch_size will cover path from 4 till 20..30 rather quickly and, I suppose, the effect might be negligible.
Thanks for the feedback y'all, will update this shortly.
I would love to see the same change for the gradient_accumulation_steps
parameter since it is related to the batch size.