fast-stable-diffusion icon indicating copy to clipboard operation
fast-stable-diffusion copied to clipboard

Training

Open dayglo opened this issue 2 years ago • 2 comments

Hi,

I had this working a few days ago, but now cant figure out whats wrong...

I'm finetuning 1.5, and trying to use models from huggingface.

If use the base model it all works fine, but if I specify a another one, (nitrosocke/mo-di-diffusion, or the adventure time one), when i get to the training step I get an error when 'Attempting to unscale FP16 gradients.'

Does anyone know what I could have done wrong?

'########:'########:::::'###::::'####:'##::: ##:'####:'##::: ##::'######:::
... ##..:: ##.... ##:::'## ##:::. ##:: ###:: ##:. ##:: ###:: ##:'##... ##::
::: ##:::: ##:::: ##::'##:. ##::: ##:: ####: ##:: ##:: ####: ##: ##:::..:::
::: ##:::: ########::'##:::. ##:: ##:: ## ## ##:: ##:: ## ## ##: ##::'####:
::: ##:::: ##.. ##::: #########:: ##:: ##. ####:: ##:: ##. ####: ##::: ##::
::: ##:::: ##::. ##:: ##.... ##:: ##:: ##:. ###:: ##:: ##:. ###: ##::: ##::
::: ##:::: ##:::. ##: ##:::: ##:'####: ##::. ##:'####: ##::. ##:. ######:::
:::..:::::..:::::..::..:::::..::....::..::::..::....::..::::..:::......::::

  0% 0/10000 [00:00<?, ?it/s] emmanelson  emmanelson Traceback (most recent call last):
  File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 798, in <module>
    main()
  File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 686, in main
    accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 920, in clip_grad_norm_
    self.unscale_gradients()
  File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 904, in unscale_gradients
    self.scaler.unscale_(opt)
  File "/usr/local/lib/python3.8/dist-packages/torch/cuda/amp/grad_scaler.py", line 279, in unscale_
    optimizer_state["found_inf_per_device"] = self._unscale_grads_(optimizer, inv_scale, found_inf, False)
  File "/usr/local/lib/python3.8/dist-packages/torch/cuda/amp/grad_scaler.py", line 207, in _unscale_grads_
    raise ValueError("Attempting to unscale FP16 gradients.")
ValueError: Attempting to unscale FP16 gradients.
  0% 0/10000 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--train_text_encoder', '--image_captions_filename', '--save_starting_step=500', '--stop_text_encoder_training=5300', '--save_n_steps=0', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/emma-adventuretime2', '--pretrained_model_name_or_path=/content/stable-diffusion-custom', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/emma-adventuretime2/instance_images', '--output_dir=/content/models/emma-adventuretime2', '--instance_prompt=', '--seed=345921', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--lr_warmup_steps=0', '--max_train_steps=10000']' returned non-zero exit status 1.
Something went wrong

dayglo avatar Dec 07 '22 13:12 dayglo

Use the ckpt in the hf repo, not the diffusers

TheLastBen avatar Dec 07 '22 13:12 TheLastBen

ok ill give that a go! thanks 🙏🏻

dayglo avatar Dec 07 '22 13:12 dayglo