lammoh
lammoh
I'm using pretrain code with falcon 7B. I've noticed that the loss didn't change for 400 iterations. ``` iter 1: loss 11.0666, time: 13381.00ms, speed: 306 toks/s/device .... iter 400:...
Hello, I'm using the pretrain code to train falcon-7B, I've already used lit-llama and trained llama-7B. I noticed that falcon is very slow compared to llama, and it takes more...
I'd like to request a support for BLOOM as it was pretrained on many languages
based on the discussion here: https://github.com/Lightning-AI/lit-llama/pull/435#issuecomment-1667966748, the current code can only convert the base model into huggingface format. For converting adapter we a different code, I'd like to request your...
Hi everyone, Do you have resources that could help me understand "PackedDataset"? I'm trying to implement two things: **(1) Multiprocessing script for the tokenization:** which is done, I implemented the...
I think #357 should be applied to the pretrain script as well. Thank you so much lightning team for this amazing repository.
Hello, according to our discussion [here](https://github.com/Lightning-AI/lit-llama/issues/330#issuecomment-1567376696), I think `devices` should be changed in the [pretraininig code](https://github.com/Lightning-AI/lit-llama/blob/main/pretrain/redpajama.py#L117) to `fabric.world_size`, since the batch size refers to the global batch size. `devices` in...
I would like to request a new feature in the code: the ability to resume training from a checkpoint. Currently, the code can save a checkpoint of the model's state...
Hi, I'm using multi-node training and I need to know how to calculate the hyperparameter values in the train_redpajama script. Can you please elaborate more on how to set these...
Changing the model at the inference from sampling to greedy lead to generating empty audio files. When printing the generated audio sequence I can see that they’re being repeated. This...