lit-llama Is it possible to further reduce the RAM?

I use multiple A6000 cards for pretraining. The RAM of each card is 49140MiB.

I tried to pretrain LLaMA-7B with bf16-mixed,

batch_size = 60 # 125
micro_batch_size = 1 # 1 × 4 = 4 for each iterations

it works well before the backpropagation. Before backpropagation, it takes 47+/48G. But it's OOM when it reach the 15th step (When backpropagation is operated).

It's a way to make this work? I can come up with the following ideas, both of which can work. But I don't think they are the best choice.

change the precision from bf16-mixed to bf16-true. But as BLOOM) said, bfloat16 mixed precision training can solve the instability problem
Reduce the context length (Block size)

Jun 17 '23 07:06 ForcewithMe66

Yes, those are ways to reduce the memory requirement. I will also make a fix soon that enables back flash attention: https://github.com/Lightning-AI/lit-parrot/pull/171

Jun 19 '23 19:06 carmocca

Hi @carmocca , can https://github.com/Lightning-AI/lit-parrot/pull/171 save some RAM while pretraining and fine-tuning？

Jun 20 '23 03:06 ForcewithMe66

Yes, would you like to port the changes from that PR here? I can do it otherwise

Jun 20 '23 04:06 carmocca

Hi @carmocca , I am not familiar with that, so I am afraid I can't port the change from lit-parrot to lit-llama here.

Jun 20 '23 05:06 ForcewithMe66

@carmocca I made a PR for lit-llama by following your PR. However, after I ran finetune/adapter.py on A100, the memory increased from 19.5G to 20.3G. Any idea what I did wrong?

Jul 23 '23 23:07 ruoyu61