alpaca-lora icon indicating copy to clipboard operation
alpaca-lora copied to clipboard

finetune 65B model on A100-80G with lora

Open coni-coco opened this issue 1 year ago • 5 comments

It seems that we should use tensor parallel to save memory, but there is no tensor parallel in the definition of huggingface llama module. then, is it compatible with lora adapter?

coni-coco avatar Mar 20 '23 12:03 coni-coco

is tensor parallel for multiple device training?

How many gpus are needed to train 65B? I have been able to train 30B but I am pretty sure that was the limit for my capabilities.

zachNA2 avatar Mar 20 '23 14:03 zachNA2

I think the bottleneck is VRAM

Do you have a guess for the amount of VRAM you'd need for 65B? I'd be curious to try it out

zachNA2 avatar Mar 20 '23 14:03 zachNA2

So my very noob and educated guess is almost as double as 35B assuming the model was compiled and attention was optimized

Do you have a guess for the amount of VRAM you'd need for 65B? I'd be curious to try it out

at least 130G VRAM to train 65B model and 60G VRAM to train 30B model. Use tensor parallel to partition 65B model's parameter into >= 3 A100-80G is ok. Is anyone know how to implement llama tensor parallel with lora adapter?

coni-coco avatar Mar 21 '23 02:03 coni-coco

Hi, do you know how to implement llama tensor parallel with lora adapter? @coni-coco

JiexingQi avatar Apr 12 '23 06:04 JiexingQi