fsdp_qlora
fsdp_qlora copied to clipboard
Question about GPU memory usage.
Hi, I tried to finetune a llama7b model with HQQ-LORA using dual GPUs. I found that during "Loading & Quantizing Model Shards", the peak GPU memory usage acheved 35G. What's the problem? the run command is:
export CUDA_VISIBLE_DEVICES=3,4
python train.py \
--world_size 2 \
--model_name /workspace/model/Llama-2-7b-chat-hf \
--gradient_accumulation_steps 2 \
--batch_size 1 \
--context_length 4096 \
--num_epochs 1 \
--sharding_strategy full_shard \
--precision bf16 \
--train_type hqq_lora \
--use_gradient_checkpointing true \
--use_cpu_offload true \
--dataset dummy \
--verbose true
Looking forward to your reply.