fsdp_qlora Question about GPU memory usage.

Question about GPU memory usage.

Open mxjmtxrm opened this issue 10 months ago • 0 comments

Hi, I tried to finetune a llama7b model with HQQ-LORA using dual GPUs. I found that during "Loading & Quantizing Model Shards", the peak GPU memory usage acheved 35G. What's the problem? the run command is:

export CUDA_VISIBLE_DEVICES=3,4
python train.py \
--world_size 2 \
--model_name /workspace/model/Llama-2-7b-chat-hf \
--gradient_accumulation_steps 2 \
--batch_size 1 \
--context_length 4096 \
--num_epochs 1 \
--sharding_strategy full_shard \
--precision bf16 \
--train_type hqq_lora \
--use_gradient_checkpointing true \
--use_cpu_offload true \
--dataset dummy \
--verbose true

Looking forward to your reply.

Apr 25 '24 02:04 mxjmtxrm

fsdp_qlora fsdp_qlora copied to clipboard

Question about GPU memory usage.

fsdp_qlora
fsdp_qlora copied to clipboard