ChaosCodes comments

Results 11 comments of


                                            ChaosCodes

Resuming training

> Hi! My training crashed, and I couldn't find the code to resume training from the last saved checkpoint. How can I resume my training? How do you handle this?...

Hi, thanks for your interest. You can try the 3T version TinyLlama via https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T/tree/main, but as you can see, the model may have already satuated before 3T. We are conducting...

how to determine reasonable max steps？

Hi you can determine the max steps based on how much tokens you want to train when using cos lr schedule.

How do I use these data sets to train new models?

Hi we are working on these two datasets, will release the scripts when we finish.

Does it support spanish language?

Hi, currently our training datasets mainly contain English corpus. I think that not much spanish are training during pretraining. However, I think that you can collect about over 50B high...

Does it support spanish language?

> How must cost the training of a tiny llama in spanish? It depends on your token number. For example, you need about half a month for ~250B tokens under...

Does it support spanish language?

I am not sure how many tokens required to achieve a good continual pretrained models for spanish. Maybe will be less than 250B. Sorry I have no experience about that.

Unable to pretrain: tokenizer raises NotImplementedError

Hi, you can download the tokenizer with `mkdir data && cd data && mkdir llama && cd llama && wget https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-480k-1T/blob/main/tokenizer.model && cd ../.. `

Taking a few days to complete SlimPajama "Train" data

Hi, I think the speed depends on how much cpu cores do you have. When we use 128 cores, it seems to take about a day to do this.

[Question] Is pre-training with FP32 possible?

Hi, you can check [this](https://lightning.ai/docs/fabric/stable/api/fabric_args.html#precision) to enable the FP32 training.