diffusers AttributeError: Can't pickle local object 'main.<locals>.<lambda>'

Describe the bug

Under Windows and conda, I get this error on the current diffusers version when I start examples/dreambooth/train_dreambooth.py AttributeError: Can't pickle local object 'main.<locals>.<lambda>'

I solved setting the DataLoaders num_workers=0 in line 573: https://github.com/huggingface/diffusers/blob/8b7cb962a56fb9ca5783feed605c895b3abdf7fa/examples/dreambooth/train_dreambooth.py#L573

    train_dataloader = torch.utils.data.DataLoader(
        train_dataset,
        batch_size=args.train_batch_size,
        shuffle=True,
        collate_fn=lambda examples: collate_fn(examples, args.with_prior_preservation),
        num_workers=0,

I didn't have that problem with latest stable diffusion 2.1 512px base model training, but with 768x model version I got this error. But I can't say for sure I didn't change anything on my side.

Any ideas what's the reason? And would it be wise to change that in the repo?

Reproduction

Logs

No response

System Info

diffusers 0.10.2

Dec 13 '22 17:12 djdookie

Hey @djdookie,

Could you please post a fully reproducible code snippet?

Dec 19 '22 11:12 patrickvonplaten

Ideally just a bash command of how you've run dreambooth.

Dec 19 '22 11:12 patrickvonplaten

I've run this command:

SET MODEL_NAME="stabilityai/stable-diffusion-2-1"
SET INSTANCE_DIR="some/path/to/training_pics"
SET CLASS_DIR="some/path/to/class_pics"
SET OUTPUT_DIR="some/path/to/output_directory"

accelerate launch examples\dreambooth\train_dreambooth.py ^
    --pretrained_model_name_or_path=%MODEL_NAME% ^
    --revision="fp16" ^
    --instance_data_dir=%INSTANCE_DIR% ^
    --class_data_dir=%CLASS_DIR% ^
    --output_dir=%OUTPUT_DIR% ^
    --with_prior_preservation --prior_loss_weight=1.0 ^
    --seed=1337 ^
    --resolution=768 ^
    --train_batch_size=1 ^
    --train_text_encoder ^
    --mixed_precision="fp16" ^
    --gradient_accumulation_steps=1 --gradient_checkpointing ^
    --use_8bit_adam ^
    --learning_rate=1.72e-6 ^
    --lr_scheduler="constant" ^
    --lr_warmup_steps=0 ^
    --num_class_images=200 ^
    --max_train_steps=10000 ^
    --checkpointing_steps=500 ^
    --instance_prompt="photo of xyz person" ^
    --class_prompt="photo of person"

Dec 21 '22 13:12 djdookie

hmmm interesting, I cannot reproduce this error on a single GPU. I don't have access to a multi-GPU setup at the moment.

@patil-suraj @pcuenca any ideas here maybe by any chance. I do suspect that this comes from datasets .

Jan 02 '23 17:01 patrickvonplaten

Similar issue https://github.com/huggingface/diffusers/issues/2041 and fix #2070

Jan 23 '23 09:01 patil-suraj

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Feb 16 '23 15:02 github-actions[bot]