How to set max_iters
I am trying to pretrain a model on book corpus, and i was wondering how do i make sure the training runs for 10,000 iterations?
There a max_iters in eval, but not in train.
I am just using the documentation from the home page.
litgpt pretrain \ --model_name pythia-160m \ --tokenizer_dir checkpoints/EleutherAI/pythia-160m \ --data TextFiles \ --data.train_data_path "custom_texts/" \ --train.max_tokens 10_000_000 \ --out_dir out/custom-model
How do i pass the max iters value? Thanks
This would currently not be possible without code modification, yet. It would be nice to add support for train.max_steps but that's currently not implemented yet:
https://github.com/Lightning-AI/litgpt/blob/221b7ef54161272162aa9b036f1ef3674f3160a4/litgpt/pretrain.py#L427
ok thanks
Oh we can keep it open actually, I think it would be a nice thing to add some day. Thanks for raising that!
Can we add training steps to TinyLlama? Tutorial
Or to the TinyLlama implementation with OpenWebText Data set?
I'd say we ideally need to add it to the pretrain code (https://github.com/Lightning-AI/litgpt/blob/main/litgpt/pretrain.py) so that it can be used in general with all datasets.