diffusers
diffusers copied to clipboard
Dreambooth example with multi-gpu significantly slower than single GPU
Describe the bug
If the accelerate config
is setup for multi-GPU (default config works) then training speed appears to dramatically slow. With a 2x 3090 system I'm seeing training go from ~2 it/s with single 3090 configured to ~6 s/it with multi-GPU configured.
I'm guessing I'm either misunderstanding what should be happening in a multi-GPU context or there's more configuration needed?
Reproduction
Generate the default config with accelerate config default
and then run the dreambooth script with the following args:
accelerate launch train_dreambooth.py --pretrained_model_name_or_path=/store/sd/diffusers_models/stable-diffusion-2-1 --instance_data_dir=/path/to/data --output_dir=/path/to/out --instance_prompt="instance prompt here" --resolution=768 --train_batch_size=1 --gradient_accumulation_steps=1 --learning_rate=2e-6 --lr_scheduler="constant" --lr_warmup_steps=0 --max_train_steps=5000 --train_text_encoder --enable_xformers_memory_efficient_attention --sample_batch_size=4 --checkpointing_steps=500 --use_8bit_adam
Note I'm seeing the same with any value for --mixed-precision
if that's also supplied.
Logs
With multi-gpu (2x 3090)
[00:40<7:58:13, 5.75s/it, loss=0.175, lr=2e-6]
Single 3090
[00:11<42:23, 1.96it/s, loss=0.299, lr=2e-6]
System Info
Latest diffusers commit [debc74f]
Can provide any module versions people think are relevant but generally have the latest requirements.txt installed.
Can you share the contents of your .cache/huggingface/accelerate/default_config.yaml
file ? It'll help in understanding if accelerate is able to find both your GPU's
The path to the file should be listed when you previously generated your config or you can run accelerate config default
and it should point to the already existing yaml file.
Sure, here is the config:
command_file: null
commands: null
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: MULTI_GPU
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: 0,1
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
megatron_lm_config: {}
mixed_precision: fp16
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_name: null
tpu_zone: null
use_cpu: false
Can also confirm both GPUs are being used. Watching nvidia-smi
I can see them both fully loading VRAM for training and experiencing ~100% utilization during training,
cc @patil-suraj @williamberman @pcuenca
Same problem here. Any solutions?
cc @patil-suraj again ;-)
Similar issue #1734, answered here https://github.com/huggingface/diffusers/issues/1734#issuecomment-1366017170.
I'll add some docs to the dreambooth readme for multi-gpu training
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.