fast-stable-diffusion
fast-stable-diffusion copied to clipboard
Training
Hi,
I had this working a few days ago, but now cant figure out whats wrong...
I'm finetuning 1.5, and trying to use models from huggingface.
If use the base model it all works fine, but if I specify a another one, (nitrosocke/mo-di-diffusion, or the adventure time one), when i get to the training step I get an error when 'Attempting to unscale FP16 gradients.
'
Does anyone know what I could have done wrong?
'########:'########:::::'###::::'####:'##::: ##:'####:'##::: ##::'######:::
... ##..:: ##.... ##:::'## ##:::. ##:: ###:: ##:. ##:: ###:: ##:'##... ##::
::: ##:::: ##:::: ##::'##:. ##::: ##:: ####: ##:: ##:: ####: ##: ##:::..:::
::: ##:::: ########::'##:::. ##:: ##:: ## ## ##:: ##:: ## ## ##: ##::'####:
::: ##:::: ##.. ##::: #########:: ##:: ##. ####:: ##:: ##. ####: ##::: ##::
::: ##:::: ##::. ##:: ##.... ##:: ##:: ##:. ###:: ##:: ##:. ###: ##::: ##::
::: ##:::: ##:::. ##: ##:::: ##:'####: ##::. ##:'####: ##::. ##:. ######:::
:::..:::::..:::::..::..:::::..::....::..::::..::....::..::::..:::......::::
0% 0/10000 [00:00<?, ?it/s] emmanelson emmanelson Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 798, in <module>
main()
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 686, in main
accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 920, in clip_grad_norm_
self.unscale_gradients()
File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 904, in unscale_gradients
self.scaler.unscale_(opt)
File "/usr/local/lib/python3.8/dist-packages/torch/cuda/amp/grad_scaler.py", line 279, in unscale_
optimizer_state["found_inf_per_device"] = self._unscale_grads_(optimizer, inv_scale, found_inf, False)
File "/usr/local/lib/python3.8/dist-packages/torch/cuda/amp/grad_scaler.py", line 207, in _unscale_grads_
raise ValueError("Attempting to unscale FP16 gradients.")
ValueError: Attempting to unscale FP16 gradients.
0% 0/10000 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--train_text_encoder', '--image_captions_filename', '--save_starting_step=500', '--stop_text_encoder_training=5300', '--save_n_steps=0', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/emma-adventuretime2', '--pretrained_model_name_or_path=/content/stable-diffusion-custom', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/emma-adventuretime2/instance_images', '--output_dir=/content/models/emma-adventuretime2', '--instance_prompt=', '--seed=345921', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--lr_warmup_steps=0', '--max_train_steps=10000']' returned non-zero exit status 1.
Something went wrong
Use the ckpt in the hf repo, not the diffusers
ok ill give that a go! thanks 🙏🏻