DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Can your own PC train a deepspeed model?[BUG]

Open kuokay opened this issue 1 year ago • 3 comments

image

kuokay avatar Apr 21 '23 07:04 kuokay

@kuokay Can you please share the log output? The path is stated in the error message (Log output: /home/kuokay/...)

mrwyattii avatar Apr 21 '23 22:04 mrwyattii

My own computer win11 3060 graphics card, can train a 1.3b model?

---Original--- From: "Michael @.> Date: Sat, Apr 22, 2023 06:40 AM To: @.>; Cc: @.@.>; Subject: Re: [microsoft/DeepSpeed] Can your own PC train a deepspeedmodel?[BUG] (Issue #3333)

@kuokay Can you please share the log output? The path is stated in the error message (Log output: /home/kuokay/...)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

kuokay avatar Apr 22 '23 00:04 kuokay

Is this the 3060 with 12GB of memory? If so, you may be able to train the 1.3b model if you reduce the batch size to 1. I just tested and I was using ~12GB of memory with the following command:

deepspeed --num_gpus 1 main.py --model_name_or_path facebook/opt-1.3b --gradient_accumulation_steps 2 --lora_dim 128 --zero_stage 0 --deepspeed --output_dir ./output/ --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_checkpointing

This may still fail due to memory limitations on your system. However, we are working on support for an --offload feature that should further reduce the memory requirements to train these models.

mrwyattii avatar Apr 24 '23 17:04 mrwyattii

Issue is stale, closing.

mrwyattii avatar Sep 14 '23 23:09 mrwyattii