litgpt
litgpt copied to clipboard
Training time is unexpectedly very slow compared to lit-llama
Hello,
I'm using the pretrain code to train falcon-7B, I've already used lit-llama and trained llama-7B.
I noticed that falcon is very slow compared to llama, and it takes more memory.
In llama 7B:
iter 2: loss 11.0692, time: 5024.25ms, speed: 1705 toks/s/device
In flacon 7B:
iter 2: loss 11.0666, time: 26360.27ms, speed: 388 toks/s/device
Also, falcon consumes a lot of the memory, I couldn't increase the batch size to more than 160 with micro batch size 5, while in llama I went to 384 with micro batch size 6. Is it normal?
I'm also hitting some CUDA out of memory errors on models + data that I expect to more easily fit on a 40GB A100 MiG.
I'm not familiar with the lit-llama codebase, so I'm not sure what's potentially different in lit-parrot but wanted to note that I'm seeing something similar.
Do you still see this behaviour, and if so, can you share exactly the code you ran and the arguments passed?
This is because LLaMA fine-tuning is hardcoded to use 256
max_seq_length:
https://github.com/Lightning-AI/lit-llama/blob/main/scripts/prepare_alpaca.py#L26
https://github.com/Lightning-AI/lit-llama/blob/main/finetune/adapter.py#L52
Whereas this repository is configured to use the longest sequence length in alpaca: 1037
. If you override it to 256
in https://github.com/Lightning-AI/lit-gpt/blob/main/finetune/adapter.py#L30, you should see the times match.
Actually I was using the pretrain script, and I think the max token length is fixed in both lit-llama and lit-gpt?