litgpt
litgpt copied to clipboard
falcon-40b out of memory
Hi! I am trying to finetne falcon-40b with a single A100 GPU of 80GB memory. I tried with decreasing the micro batch size to be 1 however it is still OOM for both adapter_v2 and lora with bfloat16-mixed / fp-16. Any suggestion on how to solve this issue without using multiple GPU? Thanks a lot!
I tried again with 2 A100 GPUs but still OOM. Set device = 2 and tried both lora and adapter_v2. Any help would be appreciated!
Falcon 40B won't fit in a single 80GB card.
I will report back when I find out what's the minimum memory requirement to fine-tune it. But I don't have access to A100 80GB right now
Any luck with finetuning? Running into OOM while trying to fine tune Falcon40B on a 8 GPU A100 80 GB machine. Tried reducing num_devices, micro_batch_size, lower lora rank.
Update: Looks like the recent main don't support multi GPU training. Any plans/threads to support that feature?
@gpravi Distributed support for LoRA is tracked in #161
So currently it's not possible to finetune Falcon 40B using Lit-parrot, right?
@weilong-web Yeah, I don't think it works out of the box... Looks like someone managed to finetune Falcon 40b - https://github.com/Lightning-AI/lit-gpt/issues/198
@weilong-web Yeah, I don't think it works out of the box... Looks like someone managed to finetune Falcon 40b - #198
I am able to use this tool to finetune 40b: https://github.com/rmihaylov/falcontune
@lynngao
I was able to finetune on the Falcon 40B instruct 4 bit version.
Were you able to finetune Falcon 40B model? I ran into this issue while saving the checkpoint
@lynngao
I was able to finetune on the Falcon 40B instruct 4 bit version.
Were you able to finetune Falcon 40B model? I ran into this issue while saving the checkpoint
No I only tried the 4bit version.
@weilong-web Yeah, I don't think it works out of the box... Looks like someone managed to finetune Falcon 40b - #198
I am able to use this tool to finetune 40b: https://github.com/rmihaylov/falcontune
Were you able to run it in DPP or only single GPU?
@lynngao
I was able to finetune on the Falcon 40B instruct 4 bit version.
Were you able to finetune Falcon 40B model? I ran into this issue while saving the checkpoint
downgrading bitsandbytes to 0.37.2 worked for me (took me a few days to find this thread..) https://github.com/TimDettmers/bitsandbytes/issues/324
@alexeiga Nice. Can you please let us know the configurations?
Also, the current main branch doesn't implement multi gpu training. How did you manage to implement it?
@alexeiga Nice. Can you please let us know the configurations?
Also, the current main branch doesn't implement multi gpu training. How did you manage to implement it?
i tried to, but without success... was only able to go single gpu, but training is VERY slow.
QLoRA finetuning support is tracked in #176. Until that is supported, you can try the suggestions described in https://github.com/Lightning-AI/lit-gpt/blob/main/tutorials/oom.md