Training llama 3.1 70B using 4 A6000

Open etemiz opened this issue 1 year ago • 1 comments

I need to train llama 3.1 70B in 8 bit Qlora or something using 4 * A6000 GPUS = 192 GB total VRAM. Can I do it?

Or what is the best way to train it using 192 GB total VRAM?

Sep 19 '24 15:09 etemiz

You can try running it with LoRA.

CUDA_VISIBLE_DEVICES=0,1,2,3 \
swift sft \
    --model_type llama3_1-70b-instruct \
    --dataset <dataset> \
    --num_train_epochs 5 \
    --sft_type lora \
    --output_dir output \
    ...

Sep 21 '24 14:09 Jintao-Huang