diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Training Lora Fails after latest update

Open pedropaf opened this issue 5 months ago • 4 comments

Describe the bug

After updating to latest the script to train a lora fails, I'll attach the error below.

Reproduction

accelerate launch train_dreambooth_lora_sdxl_advanced.py
--pretrained_model_name_or_path=$MODEL_NAME
--pretrained_vae_model_name_or_path=$VAE_PATH
--dataset_name='boby-set'
--instance_prompt="photo of a TOK dog"
--validation_prompt="a TOK dog in illustration style full body" --output_dir='boby-sdxl-lora-650'
--caption_column="prompt"
--mixed_precision="bf16"
--resolution=1024
--train_batch_size=1
--repeats=1
--gradient_accumulation_steps=1
--gradient_checkpointing
--learning_rate=1.0
--text_encoder_lr=1.0
--optimizer="prodigy"
--train_text_encoder_ti
--train_text_encoder_ti_frac=0.5
--snr_gamma=5.0
--lr_scheduler="costant"
--lr_warmup_steps=0
--rank=32
--max_train_steps=650
--checkpointing_steps=2000
--seed="0"

Logs

03/15/2024 19:09:30 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: bf16

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'dynamic_thresholding_ratio', 'variance_type', 'thresholding', 'clip_sample_range', 'rescale_betas_zero_snr'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
  File "/home/pedro/projects/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py", line 2366, in <module>
    main(args)
  File "/home/pedro/projects/diffusers/examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py", line 1275, in main
    text_encoder_one = text_encoder_cls_one.from_pretrained(
  File "/home/pedro/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2362, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
TypeError: CLIPTextModel.__init__() got an unexpected keyword argument 'variant'
Traceback (most recent call last):
  File "/home/pedro/.local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/pedro/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/home/pedro/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1017, in launch_command
    simple_launcher(args)
  File "/home/pedro/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth_lora_sdxl_advanced.py', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-base-1.0', '--pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix', '--dataset_name=boby-set', '--instance_prompt=photo of a TOK dog', '--validation_prompt=a TOK dog in illustration style full body', '--output_dir=boby-sdxl-lora-650', '--caption_column=prompt', '--mixed_precision=bf16', '--resolution=1024', '--train_batch_size=1', '--repeats=1', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--learning_rate=1.0', '--text_encoder_lr=1.0', '--optimizer=prodigy', '--train_text_encoder_ti', '--train_text_encoder_ti_frac=0.5', '--snr_gamma=5.0', '--lr_scheduler=costant', '--lr_warmup_steps=0', '--rank=32', '--max_train_steps=650', '--checkpointing_steps=2000', '--seed=0']' returned non-zero exit status 1.

System Info

main branch, peft main branch.

Who can help?

@sayakpaul

pedropaf avatar Mar 15 '24 19:03 pedropaf

Cc: @linoytsaban

sayakpaul avatar Mar 16 '24 01:03 sayakpaul

Hey @pedropaf! could you please specify which base model and vae you're using? (i.e. --pretrained_model_name_or_path=$MODEL_NAME --pretrained_vae_model_name_or_path=$VAE_PATH)

linoytsaban avatar Mar 26 '24 10:03 linoytsaban

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 19 '24 15:04 github-actions[bot]

hi @pedropaf

is this still an issue? if so, can you provide the additional info @linoytsaban asked?

thanks!

yiyixuxu avatar Apr 19 '24 20:04 yiyixuxu

Sorry @linoytsaban @yiyixuxu I didn't look at this in a while, I did use the base model and VAE:

export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0" export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"

I can see there are some updates since I last looked at this so I'll give it a try and see if I still get the issue or not.

pedropaf avatar May 17 '24 12:05 pedropaf

I have tried with the latest main branch and I don't get the error, so I'll close the issue.

pedropaf avatar May 17 '24 15:05 pedropaf