litgpt How to set max

I am trying to pretrain a model on book corpus, and i was wondering how do i make sure the training runs for 10,000 iterations?

There a max_iters in eval, but not in train.

I am just using the documentation from the home page.

litgpt pretrain \ --model_name pythia-160m \ --tokenizer_dir checkpoints/EleutherAI/pythia-160m \ --data TextFiles \ --data.train_data_path "custom_texts/" \ --train.max_tokens 10_000_000 \ --out_dir out/custom-model

How do i pass the max iters value? Thanks

May 25 '24 14:05 srivassid

This would currently not be possible without code modification, yet. It would be nice to add support for train.max_steps but that's currently not implemented yet:

https://github.com/Lightning-AI/litgpt/blob/221b7ef54161272162aa9b036f1ef3674f3160a4/litgpt/pretrain.py#L427

May 25 '24 15:05 rasbt

ok thanks

May 25 '24 15:05 srivassid

Oh we can keep it open actually, I think it would be a nice thing to add some day. Thanks for raising that!

May 25 '24 16:05 rasbt

Can we add training steps to TinyLlama? Tutorial

Or to the TinyLlama implementation with OpenWebText Data set?

May 25 '24 16:05 srivassid

I'd say we ideally need to add it to the pretrain code (https://github.com/Lightning-AI/litgpt/blob/main/litgpt/pretrain.py) so that it can be used in general with all datasets.

May 26 '24 00:05 rasbt

How to set max_iters