litgpt icon indicating copy to clipboard operation
litgpt copied to clipboard

falcon-40b out of memory

Open lynngao opened this issue 1 year ago • 13 comments

Hi! I am trying to finetne falcon-40b with a single A100 GPU of 80GB memory. I tried with decreasing the micro batch size to be 1 however it is still OOM for both adapter_v2 and lora with bfloat16-mixed / fp-16. Any suggestion on how to solve this issue without using multiple GPU? Thanks a lot!

lynngao avatar Jun 18 '23 06:06 lynngao

I tried again with 2 A100 GPUs but still OOM. Set device = 2 and tried both lora and adapter_v2. Any help would be appreciated!

lynngao avatar Jun 18 '23 09:06 lynngao

Falcon 40B won't fit in a single 80GB card.

I will report back when I find out what's the minimum memory requirement to fine-tune it. But I don't have access to A100 80GB right now

carmocca avatar Jun 19 '23 19:06 carmocca

Any luck with finetuning? Running into OOM while trying to fine tune Falcon40B on a 8 GPU A100 80 GB machine. Tried reducing num_devices, micro_batch_size, lower lora rank.

Update: Looks like the recent main don't support multi GPU training. Any plans/threads to support that feature?

gpravi avatar Jun 22 '23 23:06 gpravi

@gpravi Distributed support for LoRA is tracked in #161

carmocca avatar Jun 22 '23 23:06 carmocca

So currently it's not possible to finetune Falcon 40B using Lit-parrot, right?

weilong-web avatar Jun 27 '23 13:06 weilong-web

@weilong-web Yeah, I don't think it works out of the box... Looks like someone managed to finetune Falcon 40b - https://github.com/Lightning-AI/lit-gpt/issues/198

gpravi avatar Jun 28 '23 22:06 gpravi

@weilong-web Yeah, I don't think it works out of the box... Looks like someone managed to finetune Falcon 40b - #198

I am able to use this tool to finetune 40b: https://github.com/rmihaylov/falcontune

lynngao avatar Jun 28 '23 22:06 lynngao

@lynngao

I was able to finetune on the Falcon 40B instruct 4 bit version.

Were you able to finetune Falcon 40B model? I ran into this issue while saving the checkpoint

gpravi avatar Jun 28 '23 22:06 gpravi

@lynngao

I was able to finetune on the Falcon 40B instruct 4 bit version.

Were you able to finetune Falcon 40B model? I ran into this issue while saving the checkpoint

No I only tried the 4bit version.

lynngao avatar Jun 28 '23 22:06 lynngao

@weilong-web Yeah, I don't think it works out of the box... Looks like someone managed to finetune Falcon 40b - #198

I am able to use this tool to finetune 40b: https://github.com/rmihaylov/falcontune

Were you able to run it in DPP or only single GPU?

alexeiga avatar Jul 05 '23 08:07 alexeiga

@lynngao

I was able to finetune on the Falcon 40B instruct 4 bit version.

Were you able to finetune Falcon 40B model? I ran into this issue while saving the checkpoint

downgrading bitsandbytes to 0.37.2 worked for me (took me a few days to find this thread..) https://github.com/TimDettmers/bitsandbytes/issues/324

alexeiga avatar Jul 05 '23 08:07 alexeiga

@alexeiga Nice. Can you please let us know the configurations?

Also, the current main branch doesn't implement multi gpu training. How did you manage to implement it?

gpravi avatar Jul 05 '23 17:07 gpravi

@alexeiga Nice. Can you please let us know the configurations?

Also, the current main branch doesn't implement multi gpu training. How did you manage to implement it?

i tried to, but without success... was only able to go single gpu, but training is VERY slow.

alexeiga avatar Jul 11 '23 05:07 alexeiga

QLoRA finetuning support is tracked in #176. Until that is supported, you can try the suggestions described in https://github.com/Lightning-AI/lit-gpt/blob/main/tutorials/oom.md

carmocca avatar Jul 12 '23 12:07 carmocca