Is it possible to further reduce the RAM?
I use multiple A6000 cards for pretraining. The RAM of each card is 49140MiB.
I tried to pretrain LLaMA-7B with bf16-mixed,
batch_size = 60 # 125
micro_batch_size = 1 # 1 × 4 = 4 for each iterations
it works well before the backpropagation. Before backpropagation, it takes 47+/48G. But it's OOM when it reach the 15th step (When backpropagation is operated).
It's a way to make this work? I can come up with the following ideas, both of which can work. But I don't think they are the best choice.
- change the precision from
bf16-mixedtobf16-true. But as BLOOM) said, bfloat16 mixed precision training can solve the instability problem - Reduce the context length (Block size)
Yes, those are ways to reduce the memory requirement. I will also make a fix soon that enables back flash attention: https://github.com/Lightning-AI/lit-parrot/pull/171
Hi @carmocca , can https://github.com/Lightning-AI/lit-parrot/pull/171 save some RAM while pretraining and fine-tuning?
Yes, would you like to port the changes from that PR here? I can do it otherwise
Hi @carmocca , I am not familiar with that, so I am afraid I can't port the change from lit-parrot to lit-llama here.
@carmocca I made a PR for lit-llama by following your PR. However, after I ran finetune/adapter.py on A100, the memory increased from 19.5G to 20.3G. Any idea what I did wrong?