alpaca-lora finetune 65B model on A100-80G with lora

finetune 65B model on A100-80G with lora

Open coni-coco opened this issue 1 year ago • 5 comments

It seems that we should use tensor parallel to save memory, but there is no tensor parallel in the definition of huggingface llama module. then, is it compatible with lora adapter?

Mar 20 '23 12:03 coni-coco

is tensor parallel for multiple device training?

Mar 20 '23 13:03 FrancescoSaverioZuppichini

How many gpus are needed to train 65B? I have been able to train 30B but I am pretty sure that was the limit for my capabilities.

Mar 20 '23 14:03 zachNA2

I think the bottleneck is VRAM

Mar 20 '23 14:03 FrancescoSaverioZuppichini

Do you have a guess for the amount of VRAM you'd need for 65B? I'd be curious to try it out

Mar 20 '23 14:03 zachNA2

So my very noob and educated guess is almost as double as 35B assuming the model was compiled and attention was optimized

Mar 20 '23 14:03 FrancescoSaverioZuppichini

Do you have a guess for the amount of VRAM you'd need for 65B? I'd be curious to try it out

at least 130G VRAM to train 65B model and 60G VRAM to train 30B model. Use tensor parallel to partition 65B model's parameter into >= 3 A100-80G is ok. Is anyone know how to implement llama tensor parallel with lora adapter?

Mar 21 '23 02:03 coni-coco

Hi, do you know how to implement llama tensor parallel with lora adapter? @coni-coco

Apr 12 '23 06:04 JiexingQi

alpaca-lora alpaca-lora copied to clipboard

finetune 65B model on A100-80G with lora

alpaca-lora
alpaca-lora copied to clipboard