litgpt Can i finetune falcon-7b with 8gb Vram?

Hi, i want to ask if its possible to finetunefalcon-7b with 8gb Vram.

Jul 07 '23 18:07 luussta

No, you need at least 48G RAM but it also depends on the size of your training dataset.

Jul 08 '23 08:07 thanhnew2001

@thanhnew2001 so 48GB of RAM and 16GB of GPU do you think is it enough?

Jul 10 '23 14:07 cosmin-z

It still depends on the size of your dataset. Dolly 15k might be ok. Alpaca 50k might not be enough. I was successful with 80GB RAM

Best regards, Thanh

On Mon, Jul 10, 2023 at 9:56 PM Cosmin Zaharia @.***> wrote:

@thanhnew2001 https://github.com/thanhnew2001 so 48GB of RAM and 16GB of GPU do you think is it enough?

— Reply to this email directly, view it on GitHub https://github.com/Lightning-AI/lit-gpt/issues/242#issuecomment-1629140076, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY4GWH73AIPJTPVLAMSSQDXPQJZ7ANCNFSM6AAAAAA2CFO4ME . You are receiving this because you were mentioned.Message ID: @.***>

Jul 11 '23 13:07 thanhnew2001

I was able to fine tune falcon-7b and stablelm-7b with Alpaca 50K on a GPU 24GB of RAM, but I had to make the micro_batch_size 1 and set bf16-true. I could fine tune with both lora and adapter, but not adapter_v2, even when I set override_max_seq_length to 500. I doubt you'll be able to fine tune either with only 16GB; the 7B parameters take ~14GB of GPU RAM as bfloat16, and there's plenty of other structures in addition to the fine tuning training dataset that takes up GPU memory.

There are some pretty smart folks who have managed to fine tune models with 4-bit floats (https://github.com/artidoro/qlora). You may want to look into their work.

Jul 11 '23 22:07 patrickhwood

@thanhnew2001 Thanks!

Jul 11 '23 23:07 luussta

@patrickhwood Thank you so much! I am also creating a google colab (in its free version) to finetuning falcon7b and other models

Jul 11 '23 23:07 luussta

No, it didn’t work with that small memory. I tried many times but yes if some one else could provide their own experience, it would be great

Thanh

On Wed, 12 Jul 2023 at 05:41 Patrick Wood @.***> wrote:

I was able to fine tune falcon-7b and stablelm-7b with Alpaca 50K on a GPU 24GB of RAM, but I had to make the micro_batch_size 1 and set bf16-true. I could fine tune with both lora and adapter, but not adapter_v2, even when I set override_max_seq_length to 500. I doubt you'll be able to fine tune either with only 16GB; the 7B parameters take ~14GB of GPU RAM as bfloat16, and there's plenty of other structures in addition to the fine tuning training dataset that takes up GPU memory.

There are some pretty smart folks who have managed to fine tune models with 4-bit floats (https://github.com/artidoro/qlora). You may want to look into their work.

— Reply to this email directly, view it on GitHub https://github.com/Lightning-AI/lit-gpt/issues/242#issuecomment-1631599936, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAY4GWFC5FFX3AWKYMSY5BLXPXJDLANCNFSM6AAAAAA2CFO4ME . You are receiving this because you were mentioned.Message ID: @.***>

-- Best regards, Thanh

Jul 11 '23 23:07 thanhnew2001

qlora finetuning support is tracked in #176. I have a PR open that adds qlora inference support in #253

Jul 12 '23 12:07 carmocca

Well done for the inference support for qlora @carmocca ! :tada: . Do you know how hard is it to also add qlora for finetuning? Just initializing the model with quantization is not enough? cf:

with fabric.init_module(empty_init=True), quantization(quantize):
        model = GPT(config)

Or do we need more change in the finetuning process itself? I'm not super familiar with quantization things yet, else I would help.

Jul 12 '23 16:07 louisoutin

That might be enough. I don't advertise that we support fine-tuning because I haven't played extensively with it to be confident. Feel free to give it a shot by adding the quantization argument to the script and instantiating the context manager as you said.

The QLoRA paper also uses "Paged Optimizers", which seem like CPU offloading. The paper says

While paged optimizers are critical to do 33B/65B QLORA tuning on a single 24/48GB GPU, we do not provide hard measurements for Paged Optimizers since the paging only occurs when processing mini-batches with long sequence lengths, which is rare.

So overall the pieces are there, somebody just needs to put them together and see if the numbers match those in the QLoRA paper

Jul 12 '23 17:07 carmocca

Thanks! Will try during the week.

Jul 12 '23 19:07 louisoutin

Well done for the inference support for qlora @carmocca ! tada . Do you know how hard is it to also add qlora for finetuning? Just initializing the model with quantization is not enough? cf:
with fabric.init_module(empty_init=True), quantization(quantize):
        model = GPT(config)
Or do we need more change in the finetuning process itself? I'm not super familiar with quantization things yet, else I would help.

I tried with fabric.init_module(empty_init=True), quantization("bnb.int8"): which worked fine after changing "bf16-mixed" to "bf16-true" and micro_batch_size = 1. The training and validation losses were just barely better than without quantization (still have to set "bf16-true"), although they were so close it could simply be statistical noise. With quantization("bnb.nf4"), the training loss was worse, but the validation loss is very close:

bf16 training loss: Min 1st Qu. Median Mean 3rd Qu. Max Std 0.40 2.15 2.50 2.54 2.91 4.95 0.526 int8 training loss: Min 1st Qu. Median Mean 3rd Qu. Max Std 0.39 2.15 2.49 2.52 2.89 4.92 0.520 nf4 training loss: Min 1st Qu. Median Mean 3rd Qu. Max Std 0.50 2.35 2.72 2.76 3.15 5.38 0.559

bf16 validation loss (sampled every 10 iterations): Min 1st Qu. Median Mean 3rd Qu. Max Std 10.62 10.69 10.73 10.72 10.76 10.83 0.045 int8 validation loss (sampled every 10 iterations): Min 1st Qu. Median Mean 3rd Qu. Max Std 10.59 10.66 10.70 10.69 10.73 10.80 0.046 nf4 validation loss (sampled every 10 iterations): Min 1st Qu. Median Mean 3rd Qu. Max Std 10.68 10.74 10.78 10.77 10.81 10.87 0.043

This seems useful, so I'll submit a pull request with the quantization as a CLI option.

Jul 18 '23 16:07 patrickhwood

litgpt litgpt copied to clipboard

Can i finetune falcon-7b with 8gb Vram?

litgpt
litgpt copied to clipboard