litgpt icon indicating copy to clipboard operation
litgpt copied to clipboard

Finetune Falcon-40B with adapter_v2.py using 8 A100 80GB GPUs

Open weilong-web opened this issue 2 years ago • 13 comments

Has anyone fintuned falcon-40b with adapter_v2 using 8 A100 gpus? lora.py doesn't support multi gpus for now. I tried falcon-7b with adapter_v2 using 8 gpus, it did work out, but not for 40B.

weilong-web avatar Jun 27 '23 14:06 weilong-web

I will try, which hyper-parameters are you using?

shuwang127 avatar Jun 27 '23 15:06 shuwang127

Same as parameters inside 'adapter_v2.py'(main branch). Alpaca dataset. I just tried 40B, out of memory.

weilong-web avatar Jun 27 '23 15:06 weilong-web

I tried using lora. Ran into OOM issue with the same machine configuration (8 A100 80GB GPUs)

Just curious would the model even fit in a single machine of this configuration

gpravi avatar Jun 27 '23 22:06 gpravi

In the Lightning blog,, it was mentioned that they claimed to have the ability to finetune a 40 b falcon, but they only provided an example using a 7 billion parameter model.

Previously, I attempted to train the 40 b model using Hugging Face's framework, utilizing an 8-bit quantization and Lora, with the assistance of 8 A100 GPUs. However, I encountered difficulties in training the unquantized version of the model. This led me to switch to using the Lit-Parrot instead. It appears that no one has successfully finetuned a 40 b model without employing quantization techniques using one single machine(8 a100 80GB). I hope this explanation clarifies the situation.

weilong-web avatar Jun 28 '23 06:06 weilong-web

I tried using lora. Ran into OOM issue with the same machine configuration (8 A100 80GB GPUs)

Just curious would the model even fit in a single machine of this configuration

how did you run lora distributed? I changed the device number and run into NotImplemented Error

sylviachency avatar Jun 30 '23 23:06 sylviachency

Just check the code, it's not implemented for multi-gpus. I guess he used previous code, not main branch.

weilong-web avatar Jul 01 '23 06:07 weilong-web

There’re previous versions that support multi gpu?

On Sat, Jul 1, 2023 at 01:13 weilong @.***> wrote:

Just check the code, it's not implemented for multi-gpus. I guess he used previous code, not main branch.

— Reply to this email directly, view it on GitHub https://github.com/Lightning-AI/lit-gpt/issues/207#issuecomment-1615539189, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH432EVZOFW6UFFTL7ZGDNTXN652DANCNFSM6AAAAAAZVV4X54 . You are receiving this because you commented.Message ID: @.***>

sylviachency avatar Jul 01 '23 06:07 sylviachency

I think so, they used deepspeed previously. Later, they change to FSDP.

weilong-web avatar Jul 01 '23 09:07 weilong-web

Sorry, I was out for the last couple of days. Yes, previous versions tried to support multi GPUs (through FSDP, DeepSpeed) but none of them worked so it's been reverted to "Not implemented"

gpravi avatar Jul 03 '23 17:07 gpravi

Sorry, I was out for the last couple of days. Yes, previous versions tried to support multi GPUs (through FSDP, DeepSpeed) but none of them worked so it's been reverted to "Not implemented"

Thank you. Do you have any plan of implementing multi gpu?

sylviachency avatar Jul 03 '23 23:07 sylviachency

@carmocca Any comment?

weilong-web avatar Jul 04 '23 07:07 weilong-web

LoRA distributed support is tracked in #161

carmocca avatar Jul 04 '23 12:07 carmocca

Regarding training falcon 40b on 8 A100 80GB GPUs, I don't have access to that hardware, but you can try the suggestions in https://github.com/Lightning-AI/lit-gpt/blob/main/tutorials/oom.md. You'll need to use sharding as Falcon-40b doesn't fit in 80GBs. Additionally, adapter_v2 has more trainable parameters than adapter, so you might prefer the later for such a large model.

carmocca avatar Jul 04 '23 12:07 carmocca

Where is the parametrr cpu_offload located in the codebase? I didnt find it

louisoutin avatar Jul 12 '23 19:07 louisoutin

Here (for example)

carmocca avatar Jul 12 '23 23:07 carmocca