alpaca-lora
alpaca-lora copied to clipboard
finetune 65B model on A100-80G with lora
It seems that we should use tensor parallel to save memory, but there is no tensor parallel in the definition of huggingface llama module. then, is it compatible with lora adapter?
is tensor parallel for multiple device training?
How many gpus are needed to train 65B? I have been able to train 30B but I am pretty sure that was the limit for my capabilities.
I think the bottleneck is VRAM
Do you have a guess for the amount of VRAM you'd need for 65B? I'd be curious to try it out
So my very noob and educated guess is almost as double as 35B assuming the model was compiled and attention was optimized
Do you have a guess for the amount of VRAM you'd need for 65B? I'd be curious to try it out
at least 130G VRAM to train 65B model and 60G VRAM to train 30B model. Use tensor parallel to partition 65B model's parameter into >= 3 A100-80G is ok. Is anyone know how to implement llama tensor parallel with lora adapter?
Hi, do you know how to implement llama tensor parallel with lora adapter? @coni-coco