Finetune Falcon-40B with adapter_v2.py using 8 A100 80GB GPUs
Has anyone fintuned falcon-40b with adapter_v2 using 8 A100 gpus?
lora.py doesn't support multi gpus for now.
I tried falcon-7b with adapter_v2 using 8 gpus, it did work out, but not for 40B.
I will try, which hyper-parameters are you using?
Same as parameters inside 'adapter_v2.py'(main branch). Alpaca dataset. I just tried 40B, out of memory.
I tried using lora. Ran into OOM issue with the same machine configuration (8 A100 80GB GPUs)
Just curious would the model even fit in a single machine of this configuration
In the Lightning blog,, it was mentioned that they claimed to have the ability to finetune a 40 b falcon, but they only provided an example using a 7 billion parameter model.
Previously, I attempted to train the 40 b model using Hugging Face's framework, utilizing an 8-bit quantization and Lora, with the assistance of 8 A100 GPUs. However, I encountered difficulties in training the unquantized version of the model. This led me to switch to using the Lit-Parrot instead. It appears that no one has successfully finetuned a 40 b model without employing quantization techniques using one single machine(8 a100 80GB). I hope this explanation clarifies the situation.
I tried using lora. Ran into OOM issue with the same machine configuration (8 A100 80GB GPUs)
Just curious would the model even fit in a single machine of this configuration
how did you run lora distributed? I changed the device number and run into NotImplemented Error
Just check the code, it's not implemented for multi-gpus. I guess he used previous code, not main branch.
There’re previous versions that support multi gpu?
On Sat, Jul 1, 2023 at 01:13 weilong @.***> wrote:
Just check the code, it's not implemented for multi-gpus. I guess he used previous code, not main branch.
— Reply to this email directly, view it on GitHub https://github.com/Lightning-AI/lit-gpt/issues/207#issuecomment-1615539189, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH432EVZOFW6UFFTL7ZGDNTXN652DANCNFSM6AAAAAAZVV4X54 . You are receiving this because you commented.Message ID: @.***>
I think so, they used deepspeed previously. Later, they change to FSDP.
Sorry, I was out for the last couple of days. Yes, previous versions tried to support multi GPUs (through FSDP, DeepSpeed) but none of them worked so it's been reverted to "Not implemented"
Sorry, I was out for the last couple of days. Yes, previous versions tried to support multi GPUs (through FSDP, DeepSpeed) but none of them worked so it's been reverted to "Not implemented"
Thank you. Do you have any plan of implementing multi gpu?
@carmocca Any comment?
LoRA distributed support is tracked in #161
Regarding training falcon 40b on 8 A100 80GB GPUs, I don't have access to that hardware, but you can try the suggestions in https://github.com/Lightning-AI/lit-gpt/blob/main/tutorials/oom.md. You'll need to use sharding as Falcon-40b doesn't fit in 80GBs. Additionally, adapter_v2 has more trainable parameters than adapter, so you might prefer the later for such a large model.
Where is the parametrr cpu_offload located in the codebase? I didnt find it
Here (for example)