trained v2 768 outputs are a blank tan color
all output images look like this after training the 768 model https://imgur.com/a/Gc4rQ07
I'm aware of the issue, looking into it
Is this helpful @TheLastBen https://dushyantmin.com/fine-tuning-stable-diffusion-v20-with-dreambooth
Related:
- https://github.com/huggingface/diffusers/issues/1429
I'm getting this on both trained v2 512 & as well as v2 768
just a quick update, i think people might just be undercooking the models. when i crank CGF the model definitely comes through, and if I compare checkpoints from earlier, she comes through less at different ones.
Also noticed that when running on a different training colab, things came out fine, but they dont have text encoder. Text encoder may not be ideal unless you running on a lot more steps than you're used to, possibly.
This could be wrong, but thought I would share.

Also noticed that when running on a different training colab, things came out fine, but they dont have text encoder. Text encoder may not be ideal unless you running on a lot more steps than you're used to, possibly.
I wasn't able to train with text encoder on free colab so I turned it off when I trained and still got the blank results
I had the same issue with the tan output and I think its because my text encoder is not being trained at all. Went back to check and the text encoder crashes with the following error when training using v2. Not sure if this helps.
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 795, in <module>
main()
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 674, in main
prior_loss = F.mse_loss(noise_pred_prior.float(), noise_prior.float(), reduction="mean")
NameError: name 'noise_pred_prior' is not defined
0% 0/5 [00:07<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_text_encoder', '--dump_only_text_encoder', '--pretrained_model_name_or_path=/content/stable-diffusion-v2-512', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/gigglywiggsv2/instance_images', '--class_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/gigglywiggsv2/Regularization_images/Women', '--output_dir=/content/models/gigglywiggsv2', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=', '--seed=109149', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--lr_warmup_steps=0', '--max_train_steps=5', '--num_class_images=200']' returned non-zero exit status 1.
fixed now
Is this helpful @TheLastBen https://dushyantmin.com/fine-tuning-stable-diffusion-v20-with-dreambooth
Maybe related: I tried plugging in the example as-in from the dreambooth blogpost.
The DDIM scheduler is created with "epsilon" prediction. I believe it has to be "v-prediction". Specifically this line is missing an argument:
# Setup the scheduler and pipeline scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False, prediction_type="v_prediction") # <- make sure we are doing v_prediction pipe = StableDiffusionPipeline.from_pretrained(model_path, scheduler=scheduler, safety_checker=None, torch_dtype=torch.float16, revision="fp16").to("cuda")This may be part of a larger family of issues that is making
diffusersbreak with anything that isn't Euler sampling, at least for the 768 modelOriginally posted by @enzokro in https://github.com/huggingface/diffusers/issues/1429#issuecomment-1328385498
The problem with the training script is that the loss calculation isn't adjusted for SD 2.0's v-prediction. If you look at this patch from a repo with working finetuning, there's this new code:
https://github.com/smirkingface/stable-diffusion/commit/38e28e978f58355b6c47a12936dce08a68ea90d8#diff-c412907a30683069bf476b05c2a723954768c6ca975b49b7dc753d8f4956d1e8R258-R267
I adapted it to my custom training script by adding the following function before the training loop:
def get_loss(noise_pred, noise, latents, timesteps): if noise_scheduler.config.prediction_type == "v_prediction": timesteps = timesteps.view(-1, 1, 1, 1) alphas_cumprod = noise_scheduler.alphas_cumprod[timesteps] alpha_t = torch.sqrt(alphas_cumprod) sigma_t = torch.sqrt(1 - alphas_cumprod) target = alpha_t * noise - sigma_t * latents else: target = noise return F.mse_loss(noise_pred.float(), target.float(), reduction="mean")and replacing every call of
F.mse_loss(...)withget_loss(noise_pred, noise, latents, timesteps).I'm currently running Dreambooth training and it looks promising. Other training scripts (e.g. Textual Inversion) need to be adjusted as well.
Originally posted by @volpeon in https://github.com/huggingface/diffusers/issues/1429#issuecomment-1328937376
fixed now
Getting outputs like this with the latest version at 768 resolution.
Previously was getting brown outputs on 512 and 768 whether I used the diffusers model or ckpt
Make sure you update the A1111 repo or simply remove the SD folder and do a clean run with the latest colab
I rewrite the notebook to run locally since I don't have much room on gdrive, so it's a completely new install
Actually it's working now in diffusers. Forgot I made it replace the A1111 repo with diffusers because it was getting OOM. Diffusers wasn't updated. Tried the original using A1111 colab and still getting OOM, but I'll move that to another issue.
Side note- They all came out terrible. Stability have really butchered the model beyond repair. Literally incomprehensible and I'm using the same images and params from 1.5.