sd-scripts icon indicating copy to clipboard operation
sd-scripts copied to clipboard

Multiple GPU setup help

Open BotLifeGamer opened this issue 2 years ago • 10 comments

Hello I haven't found a guide for Multiple gpu setup for Kohya has anyone got a step by step guide I keep getting errors trying to go by this on my own. There is no clear guide for this. be greatly appreciated if someone can guide me in the right direction.

BotLifeGamer avatar Sep 09 '23 04:09 BotLifeGamer

aceelerate launch --num_processes=[NUM_YOUR_GPUS_PER_MACHINE] --num_machines=[NUM_YOUR_INDEPENDENT_MACHINES] --multi_gpus --gpu_ids=[GPU_IDS] "train_network.py" args...

If you have 4 gpus and one machine, give args as accelerate launch --num_processes=4 --multi_gpu --num_machines=1 --gpu_ids=0,1,2,3 "train_network.py" args...

BootsofLagrangian avatar Sep 09 '23 13:09 BootsofLagrangian

aceelerate launch --num_processes=[NUM_YOUR_GPUS_PER_MACHINE] --num_machines=[NUM_YOUR_INDEPENDENT_MACHINES] --multi_gpus --gpu_ids=[GPU_IDS] "train_network.py" args...

If you have 4 gpus and one machine, give args as accelerate launch --num_processes=4 --multi_gpu --num_machines=1 --gpu_ids=0,1,2,3 "train_network.py" args...

Thanks for the reply I'm slowly learning everything as I go along me and another friend spent hrs trying to figure it out before I asked read previous posts. So where does the arg go into what file into train_network.py?

BotLifeGamer avatar Sep 09 '23 15:09 BotLifeGamer

paperspace gradiertを使ってA6000二枚で学習させる場合は、"accelerate config"をターミナルから設定すれば"bmaltais/kohya_ss"での学習が実行できた "sd-scripts"で学習させる場合でも引数を設定せずに"accelerate"を使えば複数GPUに対応できた経験がある

"When using Paperspace Gradient with two A6000 GPUs for training, by initiating accelerate config from the terminal, training with bmaltais/kohya_ss became possible. Also, when training with sd-scripts, I recall being able to support multiple GPUs by using accelerate without setting specific arguments.

What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:all

----------------------------------------------------------------------------------------------------------------------------------------------------------------In which compute environment are you running?                                                                                                                   
This machine                                                                                                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------------------Which type of machine are you using?                                                                                                                            
No distributed training                                                                                                                                         
Do you want to run your training on CPU only (even if a GPU / Apple Silicon device is available)? [yes/NO]:NO                                                   
Do you wish to optimize your script with torch dynamo?[yes/NO]:NO                                                                                               
Do you want to use DeepSpeed? [yes/NO]: NO                                                                                                                      
**What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:all**                                                            
----------------------------------------------------------------------------------------------------------------------------------------------------------------Do you wish to use FP16 or BF16 (mixed precision)?                                                                                                              
fp16   

NEXTAltair avatar Sep 09 '23 23:09 NEXTAltair

aceelerate launch --num_processes=[NUM_YOUR_GPUS_PER_MACHINE] --num_machines=[NUM_YOUR_INDEPENDENT_MACHINES] --multi_gpus --gpu_ids=[GPU_IDS] "train_network.py" args... If you have 4 gpus and one machine, give args as accelerate launch --num_processes=4 --multi_gpu --num_machines=1 --gpu_ids=0,1,2,3 "train_network.py" args...

Thanks for the reply I'm slowly learning everything as I go along me and another friend spent hrs trying to figure it out before I asked read previous posts. So where does the arg go into what file into train_network.py?

You can identify args of train_network.py using following command line in terminal or prompt in sd-scripts directory.

python train_network.py -h

And if you want to use multi-gpus in sd-scripts, you need to know what accelerate library is.

BootsofLagrangian avatar Sep 10 '23 13:09 BootsofLagrangian

as accelerate launch --num_processes=4 --multi_gpu --num_machines=1 --gpu_ids=0,1,2,3 "train_network.py"

Does this look like I'm not the right path??

D:\Kohya_ss\kohya_ss>accelerate launch --num_processes=2 --multi_gpu --num_machines=1 --gpu_ids=0,1 "train_network.py" -- --resolution 1024 NOTE: Redirects are currently not supported in Windows or MacOs. [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [AIBOT]:29500 (system error: 10049 - The requested address is not valid in its context.). [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [AIBOT]:29500 (system error: 10049 - The requested address is not valid in its context.). prepare tokenizer prepare tokenizer Using DreamBooth method. Using DreamBooth method. prepare images. 0 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (1024, 1024) enable_bucket: False

[Dataset 0] loading image sizes. 0it [00:00, ?it/s] prepare dataset No data found. Please verify arguments (train_data_dir must be the parent of folders with images) / 画像がありません。引数指定を確認してください(train_data_dirには画像があるフォルダではなく、画像があるフォル ダの親フォルダを指定する必要があります) prepare images. 0 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 1 resolution: (1024, 1024) enable_bucket: False

[Dataset 0] loading image sizes. 0it [00:00, ?it/s] prepare dataset No data found. Please verify arguments (train_data_dir must be the parent of folders with images) / 画像がありません。引数指定を確認してください(train_data_dirには画像があるフォルダではなく、画像があるフォル ダの親フォルダを指定する必要があります)

BotLifeGamer avatar Sep 13 '23 19:09 BotLifeGamer

@BotLifeGamer

Here is a example command lines for training lora

accelerate launch --num_processes=2 --multi_gpu --num_machines=1 --gpu_ids=0,1 "train_network.py" --pretrained_model_name_or_path=[huggingface_path or base model path to use] --network_module=networks.lora --save_model_as=safetensors --caption_extension=".txt" --seed="42" --training_comment=[some comment ] --output_name=[output_model_name] --train_data_dir=./training/img --output_dir=./training/model --logging_dir=./training/logs --logging_dir=./training/logs --network_alpha=[LINEAR_ALPHA] --network_dim=[LINEAR_RANK] --network_args "conv_rank=[CONV_RANK]" "conv_alpha=[CONV_ALPHA]" --resolution=%RESOLUTION% --train_batch_size=%BATCH_SIZE% --learning_rate=%LEARNING_RATE% --unet_lr=%UNET_LR% --text_encoder_lr=%TE_LR% --max_train_steps=%TRAINING_STEP% --lr_warmup_steps=%WARMUP_STEP% --save_every_n_epochs=1 --lr_scheduler=%LR_SCHEDULER% --lr_scheduler_num_cycles=%LR_CYCLES% --optimizer_type=%OPTIMIZER% --optimizer_args %OPTIMIZER_ARGS% --max_grad_norm=1.0 --noise_offset=%NOISE_OFFSET% --mixed_precision=%PRECISION% --save_precision=%PRECISION% --enable_bucket --bucket_no_upscale --random_crop --bucket_reso_steps=%BUCKET_RESO_STEPS% --max_token_length=225 --shuffle_caption --xformers --gradient_checkpointing --persistent_data_loader_workers

If you want to do full fine tuning model, use "fine_tune.py" instead of "train_network.py"

BootsofLagrangian avatar Sep 16 '23 05:09 BootsofLagrangian

what is the setup for two machines on the same network? I am failing to get that part setup, my second machine seems to be right, but the main one I have no idea what to place on the ip and port because when I run a training it says the port is already on use (by the kohya ui itself running on main)

Charmandrigo avatar Jan 13 '24 00:01 Charmandrigo

@Charmandrigo Sorry for that I have only experience of one machine training. But I think accelerate support multi-machine training. If you run accelerate config, you can find options for multi-machine training for DDP. And from now, kohya's sd-scripts supports only DDP, not ZeRO or FSDP.

BootsofLagrangian avatar Jan 30 '24 11:01 BootsofLagrangian

@BootsofLagrangian Your 4 GPUs are from the same brand? Do you know if it's possible to use AMD alonside NVIDIA?

filipemeneses avatar May 28 '25 14:05 filipemeneses

@BootsofLagrangian Your 4 GPUs are from the same brand? Do you know if it's possible to use AMD alonside NVIDIA?

Yes, 4x RTX3090. Heterogeneous device training is a really challenge topic, it is not recommand for consumer level.

BootsofLagrangian avatar May 29 '25 00:05 BootsofLagrangian