litgpt icon indicating copy to clipboard operation
litgpt copied to clipboard

How to set max_iters

Open srivassid opened this issue 1 year ago • 5 comments

I am trying to pretrain a model on book corpus, and i was wondering how do i make sure the training runs for 10,000 iterations?

There a max_iters in eval, but not in train.

I am just using the documentation from the home page.

litgpt pretrain \ --model_name pythia-160m \ --tokenizer_dir checkpoints/EleutherAI/pythia-160m \ --data TextFiles \ --data.train_data_path "custom_texts/" \ --train.max_tokens 10_000_000 \ --out_dir out/custom-model

How do i pass the max iters value? Thanks

srivassid avatar May 25 '24 14:05 srivassid

This would currently not be possible without code modification, yet. It would be nice to add support for train.max_steps but that's currently not implemented yet:

https://github.com/Lightning-AI/litgpt/blob/221b7ef54161272162aa9b036f1ef3674f3160a4/litgpt/pretrain.py#L427

rasbt avatar May 25 '24 15:05 rasbt

ok thanks

srivassid avatar May 25 '24 15:05 srivassid

Oh we can keep it open actually, I think it would be a nice thing to add some day. Thanks for raising that!

rasbt avatar May 25 '24 16:05 rasbt

Can we add training steps to TinyLlama? Tutorial

Or to the TinyLlama implementation with OpenWebText Data set?

srivassid avatar May 25 '24 16:05 srivassid

I'd say we ideally need to add it to the pretrain code (https://github.com/Lightning-AI/litgpt/blob/main/litgpt/pretrain.py) so that it can be used in general with all datasets.

rasbt avatar May 26 '24 00:05 rasbt