diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Dreambooth example with multi-gpu significantly slower than single GPU

Open subpanic opened this issue 2 years ago • 8 comments

Describe the bug

If the accelerate config is setup for multi-GPU (default config works) then training speed appears to dramatically slow. With a 2x 3090 system I'm seeing training go from ~2 it/s with single 3090 configured to ~6 s/it with multi-GPU configured.

I'm guessing I'm either misunderstanding what should be happening in a multi-GPU context or there's more configuration needed?

Reproduction

Generate the default config with accelerate config default and then run the dreambooth script with the following args:

accelerate launch train_dreambooth.py --pretrained_model_name_or_path=/store/sd/diffusers_models/stable-diffusion-2-1 --instance_data_dir=/path/to/data --output_dir=/path/to/out --instance_prompt="instance prompt here" --resolution=768 --train_batch_size=1 --gradient_accumulation_steps=1 --learning_rate=2e-6 --lr_scheduler="constant" --lr_warmup_steps=0 --max_train_steps=5000 --train_text_encoder --enable_xformers_memory_efficient_attention --sample_batch_size=4 --checkpointing_steps=500 --use_8bit_adam

Note I'm seeing the same with any value for --mixed-precision if that's also supplied.

Logs

With multi-gpu (2x 3090)
[00:40<7:58:13,  5.75s/it, loss=0.175, lr=2e-6]

Single 3090
[00:11<42:23,  1.96it/s, loss=0.299, lr=2e-6]

System Info

Latest diffusers commit [debc74f]

Can provide any module versions people think are relevant but generally have the latest requirements.txt installed.

subpanic avatar Dec 28 '22 21:12 subpanic

Can you share the contents of your .cache/huggingface/accelerate/default_config.yaml file ? It'll help in understanding if accelerate is able to find both your GPU's

The path to the file should be listed when you previously generated your config or you can run accelerate config default and it should point to the already existing yaml file.

Abhinay1997 avatar Dec 29 '22 07:12 Abhinay1997

Sure, here is the config:

command_file: null                                                                                                                                                                  
commands: null
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: MULTI_GPU
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: 0,1
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp16
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_name: null
tpu_zone: null
use_cpu: false

Can also confirm both GPUs are being used. Watching nvidia-smi I can see them both fully loading VRAM for training and experiencing ~100% utilization during training,

subpanic avatar Dec 29 '22 15:12 subpanic

cc @patil-suraj @williamberman @pcuenca

patrickvonplaten avatar Jan 05 '23 21:01 patrickvonplaten

Same problem here. Any solutions?

Teoge avatar Jan 20 '23 07:01 Teoge

cc @patil-suraj again ;-)

patrickvonplaten avatar Jan 23 '23 06:01 patrickvonplaten

Similar issue #1734, answered here https://github.com/huggingface/diffusers/issues/1734#issuecomment-1366017170.

patil-suraj avatar Jan 23 '23 08:01 patil-suraj

I'll add some docs to the dreambooth readme for multi-gpu training

williamberman avatar Jan 23 '23 20:01 williamberman

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Feb 17 '23 15:02 github-actions[bot]