fast-stable-diffusion trained v2 768 outputs are a blank tan color

all output images look like this after training the 768 model https://imgur.com/a/Gc4rQ07

Nov 27 '22 09:11 dalmackay

I'm aware of the issue, looking into it

Nov 27 '22 09:11 TheLastBen

Is this helpful @TheLastBen https://dushyantmin.com/fine-tuning-stable-diffusion-v20-with-dreambooth

Nov 27 '22 20:11 archimedesinstitute

https://github.com/huggingface/diffusers/issues/1429

Nov 27 '22 20:11 0xdevalias

I'm getting this on both trained v2 512 & as well as v2 768

Nov 27 '22 22:11 animeking527

just a quick update, i think people might just be undercooking the models. when i crank CGF the model definitely comes through, and if I compare checkpoints from earlier, she comes through less at different ones.

Also noticed that when running on a different training colab, things came out fine, but they dont have text encoder. Text encoder may not be ideal unless you running on a lot more steps than you're used to, possibly.

This could be wrong, but thought I would share. 00020-2611329264-a portrait of eliza, photorealistic, a beautiful photo

Nov 28 '22 02:11 archimedesinstitute

Also noticed that when running on a different training colab, things came out fine, but they dont have text encoder. Text encoder may not be ideal unless you running on a lot more steps than you're used to, possibly.

I wasn't able to train with text encoder on free colab so I turned it off when I trained and still got the blank results

Nov 28 '22 03:11 dalmackay

I had the same issue with the tan output and I think its because my text encoder is not being trained at all. Went back to check and the text encoder crashes with the following error when training using v2. Not sure if this helps.

File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 795, in <module>
    main()
  File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 674, in main
    prior_loss = F.mse_loss(noise_pred_prior.float(), noise_prior.float(), reduction="mean")
NameError: name 'noise_pred_prior' is not defined
  0% 0/5 [00:07<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_text_encoder', '--dump_only_text_encoder', '--pretrained_model_name_or_path=/content/stable-diffusion-v2-512', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/gigglywiggsv2/instance_images', '--class_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/gigglywiggsv2/Regularization_images/Women', '--output_dir=/content/models/gigglywiggsv2', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=', '--seed=109149', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--use_8bit_adam', '--learning_rate=2e-6', '--lr_scheduler=polynomial', '--lr_warmup_steps=0', '--max_train_steps=5', '--num_class_images=200']' returned non-zero exit status 1.

Nov 28 '22 18:11 JWIGGS

fixed now

Nov 28 '22 19:11 TheLastBen

Is this helpful @TheLastBen https://dushyantmin.com/fine-tuning-stable-diffusion-v20-with-dreambooth

Maybe related: I tried plugging in the example as-in from the dreambooth blogpost.

The DDIM scheduler is created with "epsilon" prediction. I believe it has to be "v-prediction". Specifically this line is missing an argument:
# Setup the scheduler and pipeline
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False, prediction_type="v_prediction") # <- make sure we are doing v_prediction
pipe = StableDiffusionPipeline.from_pretrained(model_path, scheduler=scheduler, safety_checker=None, torch_dtype=torch.float16, revision="fp16").to("cuda")
This may be part of a larger family of issues that is making diffusers break with anything that isn't Euler sampling, at least for the 768 model

Originally posted by @enzokro in https://github.com/huggingface/diffusers/issues/1429#issuecomment-1328385498

The problem with the training script is that the loss calculation isn't adjusted for SD 2.0's v-prediction. If you look at this patch from a repo with working finetuning, there's this new code:

https://github.com/smirkingface/stable-diffusion/commit/38e28e978f58355b6c47a12936dce08a68ea90d8#diff-c412907a30683069bf476b05c2a723954768c6ca975b49b7dc753d8f4956d1e8R258-R267

I adapted it to my custom training script by adding the following function before the training loop:
    def get_loss(noise_pred, noise, latents, timesteps):
        if noise_scheduler.config.prediction_type == "v_prediction":
            timesteps = timesteps.view(-1, 1, 1, 1)
            alphas_cumprod = noise_scheduler.alphas_cumprod[timesteps]
            alpha_t = torch.sqrt(alphas_cumprod)
            sigma_t = torch.sqrt(1 - alphas_cumprod)
            target = alpha_t * noise - sigma_t * latents
        else:
            target = noise

        return F.mse_loss(noise_pred.float(), target.float(), reduction="mean")
and replacing every call of F.mse_loss(...) with get_loss(noise_pred, noise, latents, timesteps).

I'm currently running Dreambooth training and it looks promising. Other training scripts (e.g. Textual Inversion) need to be adjusted as well.

Originally posted by @volpeon in https://github.com/huggingface/diffusers/issues/1429#issuecomment-1328937376

Nov 28 '22 20:11 0xdevalias

fixed now

162 Getting outputs like this with the latest version at 768 resolution. Previously was getting brown outputs on 512 and 768 whether I used the diffusers model or ckpt

Nov 29 '22 07:11 Tophness

Make sure you update the A1111 repo or simply remove the SD folder and do a clean run with the latest colab

Nov 29 '22 07:11 TheLastBen

I rewrite the notebook to run locally since I don't have much room on gdrive, so it's a completely new install

Nov 29 '22 07:11 Tophness

Actually it's working now in diffusers. Forgot I made it replace the A1111 repo with diffusers because it was getting OOM. Diffusers wasn't updated. Tried the original using A1111 colab and still getting OOM, but I'll move that to another issue.

Side note- They all came out terrible. Stability have really butchered the model beyond repair. Literally incomprehensible and I'm using the same images and params from 1.5.

Nov 29 '22 07:11 Tophness