litgpt icon indicating copy to clipboard operation
litgpt copied to clipboard

Training time is unexpectedly very slow compared to lit-llama

Open LamOne1 opened this issue 1 year ago • 2 comments

Hello,

I'm using the pretrain code to train falcon-7B, I've already used lit-llama and trained llama-7B. I noticed that falcon is very slow compared to llama, and it takes more memory. In llama 7B: iter 2: loss 11.0692, time: 5024.25ms, speed: 1705 toks/s/device In flacon 7B: iter 2: loss 11.0666, time: 26360.27ms, speed: 388 toks/s/device

Also, falcon consumes a lot of the memory, I couldn't increase the batch size to more than 160 with micro batch size 5, while in llama I went to 384 with micro batch size 6. Is it normal?

LamOne1 avatar Jun 08 '23 05:06 LamOne1

I'm also hitting some CUDA out of memory errors on models + data that I expect to more easily fit on a 40GB A100 MiG.

I'm not familiar with the lit-llama codebase, so I'm not sure what's potentially different in lit-parrot but wanted to note that I'm seeing something similar.

iskandr avatar Jun 08 '23 21:06 iskandr

Do you still see this behaviour, and if so, can you share exactly the code you ran and the arguments passed?

carmocca avatar Jun 21 '23 16:06 carmocca

This is because LLaMA fine-tuning is hardcoded to use 256 max_seq_length: https://github.com/Lightning-AI/lit-llama/blob/main/scripts/prepare_alpaca.py#L26 https://github.com/Lightning-AI/lit-llama/blob/main/finetune/adapter.py#L52

Whereas this repository is configured to use the longest sequence length in alpaca: 1037. If you override it to 256 in https://github.com/Lightning-AI/lit-gpt/blob/main/finetune/adapter.py#L30, you should see the times match.

carmocca avatar Jun 22 '23 23:06 carmocca

Actually I was using the pretrain script, and I think the max token length is fixed in both lit-llama and lit-gpt?

LamOne1 avatar Jun 25 '23 20:06 LamOne1