litgpt
litgpt copied to clipboard
Can i finetune falcon-7b with 8gb Vram?
Hi, i want to ask if its possible to finetunefalcon-7b with 8gb Vram.
No, you need at least 48G RAM but it also depends on the size of your training dataset.
@thanhnew2001 so 48GB of RAM and 16GB of GPU do you think is it enough?
It still depends on the size of your dataset. Dolly 15k might be ok. Alpaca 50k might not be enough. I was successful with 80GB RAM
Best regards, Thanh
On Mon, Jul 10, 2023 at 9:56 PM Cosmin Zaharia @.***> wrote:
@thanhnew2001 https://github.com/thanhnew2001 so 48GB of RAM and 16GB of GPU do you think is it enough?
— Reply to this email directly, view it on GitHub https://github.com/Lightning-AI/lit-gpt/issues/242#issuecomment-1629140076, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY4GWH73AIPJTPVLAMSSQDXPQJZ7ANCNFSM6AAAAAA2CFO4ME . You are receiving this because you were mentioned.Message ID: @.***>
I was able to fine tune falcon-7b and stablelm-7b with Alpaca 50K on a GPU 24GB of RAM, but I had to make the micro_batch_size 1 and set bf16-true. I could fine tune with both lora and adapter, but not adapter_v2, even when I set override_max_seq_length to 500. I doubt you'll be able to fine tune either with only 16GB; the 7B parameters take ~14GB of GPU RAM as bfloat16, and there's plenty of other structures in addition to the fine tuning training dataset that takes up GPU memory.
There are some pretty smart folks who have managed to fine tune models with 4-bit floats (https://github.com/artidoro/qlora). You may want to look into their work.
@thanhnew2001 Thanks!
@patrickhwood Thank you so much! I am also creating a google colab (in its free version) to finetuning falcon7b and other models
No, it didn’t work with that small memory. I tried many times but yes if some one else could provide their own experience, it would be great
Thanh
On Wed, 12 Jul 2023 at 05:41 Patrick Wood @.***> wrote:
I was able to fine tune falcon-7b and stablelm-7b with Alpaca 50K on a GPU 24GB of RAM, but I had to make the micro_batch_size 1 and set bf16-true. I could fine tune with both lora and adapter, but not adapter_v2, even when I set override_max_seq_length to 500. I doubt you'll be able to fine tune either with only 16GB; the 7B parameters take ~14GB of GPU RAM as bfloat16, and there's plenty of other structures in addition to the fine tuning training dataset that takes up GPU memory.
There are some pretty smart folks who have managed to fine tune models with 4-bit floats (https://github.com/artidoro/qlora). You may want to look into their work.
— Reply to this email directly, view it on GitHub https://github.com/Lightning-AI/lit-gpt/issues/242#issuecomment-1631599936, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY4GWFC5FFX3AWKYMSY5BLXPXJDLANCNFSM6AAAAAA2CFO4ME . You are receiving this because you were mentioned.Message ID: @.***>
-- Best regards, Thanh
qlora finetuning support is tracked in #176. I have a PR open that adds qlora inference support in #253
Well done for the inference support for qlora @carmocca ! :tada: . Do you know how hard is it to also add qlora for finetuning? Just initializing the model with quantization is not enough? cf:
with fabric.init_module(empty_init=True), quantization(quantize):
model = GPT(config)
Or do we need more change in the finetuning process itself? I'm not super familiar with quantization things yet, else I would help.
That might be enough. I don't advertise that we support fine-tuning because I haven't played extensively with it to be confident. Feel free to give it a shot by adding the quantization argument to the script and instantiating the context manager as you said.
The QLoRA paper also uses "Paged Optimizers", which seem like CPU offloading. The paper says
While paged optimizers are critical to do 33B/65B QLORA tuning on a single 24/48GB GPU, we do not provide hard measurements for Paged Optimizers since the paging only occurs when processing mini-batches with long sequence lengths, which is rare.
So overall the pieces are there, somebody just needs to put them together and see if the numbers match those in the QLoRA paper
Thanks! Will try during the week.
Well done for the inference support for qlora @carmocca ! tada . Do you know how hard is it to also add qlora for finetuning? Just initializing the model with quantization is not enough? cf:
with fabric.init_module(empty_init=True), quantization(quantize): model = GPT(config)
Or do we need more change in the finetuning process itself? I'm not super familiar with quantization things yet, else I would help.
I tried
with fabric.init_module(empty_init=True), quantization("bnb.int8"):
which worked fine after changing "bf16-mixed" to "bf16-true" and micro_batch_size = 1. The training and validation losses were just barely better than without quantization (still have to set "bf16-true"), although they were so close it could simply be statistical noise. With quantization("bnb.nf4"), the training loss was worse, but the validation loss is very close:
bf16 training loss: Min 1st Qu. Median Mean 3rd Qu. Max Std 0.40 2.15 2.50 2.54 2.91 4.95 0.526 int8 training loss: Min 1st Qu. Median Mean 3rd Qu. Max Std 0.39 2.15 2.49 2.52 2.89 4.92 0.520 nf4 training loss: Min 1st Qu. Median Mean 3rd Qu. Max Std 0.50 2.35 2.72 2.76 3.15 5.38 0.559
bf16 validation loss (sampled every 10 iterations): Min 1st Qu. Median Mean 3rd Qu. Max Std 10.62 10.69 10.73 10.72 10.76 10.83 0.045 int8 validation loss (sampled every 10 iterations): Min 1st Qu. Median Mean 3rd Qu. Max Std 10.59 10.66 10.70 10.69 10.73 10.80 0.046 nf4 validation loss (sampled every 10 iterations): Min 1st Qu. Median Mean 3rd Qu. Max Std 10.68 10.74 10.78 10.77 10.81 10.87 0.043
This seems useful, so I'll submit a pull request with the quantization as a CLI option.