diffusers Cascade w/ new varnames produces funky images.

Describe the bug

Using Stable Cascade with the diffusers main branch and the HF PRs 2/44 as intended, it appears as though there's some misconfiguration that results in super funky images as shown.

Old Cascade branch before the variable renaming, using the current main Cascade revisions 00021

Diffusers master branch w/ the updated Cascade revision PRs cascade_repro

Reproduction

import torch
from math import ceil
from diffusers import (
    StableCascadeCombinedPipeline,
    StableCascadeDecoderPipeline,
    StableCascadePriorPipeline,
)

torch.set_grad_enabled(False)

prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", revision="refs/pr/2", torch_dtype=torch.bfloat16)
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", revision="refs/pr/44", torch_dtype=torch.bfloat16)
pipe = StableCascadeCombinedPipeline(
    decoder.tokenizer,
    decoder.text_encoder,
    decoder.decoder,
    decoder.scheduler,
    decoder.vqgan,
    prior.prior,
    prior.text_encoder,
    prior.tokenizer,
    prior.scheduler,
    prior.feature_extractor,
    prior.image_encoder,
)

del prior, decoder
pipe = pipe.to("cuda")

# CPU noise for cross-machine reproducibility
size = [
    1,
    pipe.prior_pipe.prior.config.in_channels,
    ceil(1024 / pipe.prior_pipe.config.resolution_multiple),
    ceil(1024 / pipe.prior_pipe.config.resolution_multiple),
]
generator = torch.Generator("cpu").manual_seed(-2060472805)
latent_input = torch.randn(size, generator=generator, dtype=torch.float32, device="cpu").to(torch.bfloat16)

pipe(
    prompt="photorealistic portrait artwork of an floral robot with a dark night cyberpunk city background",
    negative_prompt="",
    num_inference_steps=50,
    prior_num_inference_steps=50,
    prior_guidance_scale=3.0,
    latents=latent_input,
    generator=generator,
    width=1024,
    height=1024,
).images[0].save("/tmp/cascade_repro.png")

Logs

No response

System Info

Diffusers master, torch 2.3.0+ROCm-6.0

Who can help?

No response

Mar 10 '24 09:03 Beinsezii

Cc: @DN6 @kashif

Mar 11 '24 04:03 sayakpaul

It looks like #7287 fixed the coloration but not the smeary-ness.

Original reference Diffusers 0.27 repro

Made with the same script, only changing the diffusers version and cascade refs

Mar 14 '24 17:03 Beinsezii

@Beinsezii

I swapped back to the previous checkpoint here https://huggingface.co/stabilityai/stable-cascade/discussions/50 can you test out to see if this checkpoint is better?

import torch
from diffusers import StableCascadeDecoderPipeline
from diffusers.utils.testing_utils import load_pt

torch.set_grad_enabled(False)
prompt="photorealistic portrait artwork of an floral robot with a dark night cyberpunk city background"
negative_prompt=""

path_prior_out = "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/sd_cascade_image_embeds.ipadpt"
image_embeddings = load_pt(path_prior_out)

# current checkpoint

decoder = StableCascadeDecoderPipeline.from_pretrained(
    "stabilityai/stable-cascade", 
    variant="bf16", 
    torch_dtype=torch.bfloat16)
decoder.enable_model_cpu_offload()

generator = torch.Generator("cpu").manual_seed(-2060472805)
decoder_output = decoder(
    image_embeddings=image_embeddings,
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=50,
    guidance_scale= 0.0,
    generator=generator,
)
decoder_output.images[0].save("yiyi_test_9_out_current.png")

# with the new checkpoint
decoder = StableCascadeDecoderPipeline.from_pretrained(
    "stabilityai/stable-cascade", 
    variant="bf16", 
    revision="refs/pr/50",
    torch_dtype=torch.bfloat16)
decoder.enable_model_cpu_offload()

generator = torch.Generator("cpu").manual_seed(-2060472805)
decoder_output = decoder(
    image_embeddings=image_embeddings,
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=50,
    guidance_scale= 0.0,
    generator=generator,
)
decoder_output.images[0].save("yiyi_test_9_out_new.png")

current outpput yiyi_test_9_out_current

new output with the PR

yiyi_test_9_out_new

Mar 16 '24 05:03 yiyixuxu

@yiyixuxu you madwoman, that's exactly it. The robot lines up 1:1 with the original cascade PR now.

I rendered a few other prompts demonstrating the problem. Current weights vs PR 50 reverted weights. grid The robot flowers, frog hands, and armor plate decorations are all much more artifacted / smeary on the current weights.

What even are the 'new' weights that were introduced in the PR 44? All the commit says is "update diffusers format".

Reminds me of when XL 1.0 first released and the VAE weights were updated to a version that produces artifacts, needing to be reverted.

Should I close now or wait for a HF merge?

Mar 16 '24 07:03 Beinsezii

@Beinsezii it's merged!

Mar 16 '24 19:03 yiyixuxu

@Beinsezii the new weights in PR 44 were converted from https://huggingface.co/stabilityai/stable-cascade/blob/main/stage_b.safetensors - so I think the issue still exists if you use from_single_file; I also think the comfyUI checkpoints have the same issue

What even are the 'new' weights that were introduced in the PR 44? All the commit says is "update diffusers format".

Mar 16 '24 19:03 yiyixuxu

None of the Cascade pipes implement the FromSingleFileMixin trait in the first place, ~~and the official ComfyUI Cascade example errs on unet load for me~~ so further validations are going to be fun...

Edit: I didn't realize Comfy had it's own entirely separate single file checkpoints. I'll see how reproducible it is after the movie.

Mar 17 '24 02:03 Beinsezii

ComfyUI uses what looks like totally different sampling for Cascade which means its not seed compatible. out However, the robot flowers and the dog paws/clothes have some of that sorta smeary look that the diffusers weights used to have before you merged, so while not completely conclusive I think you're right.

Should probably have a note to be investigated if/when someone adds the single file mixin to Cascade.

Mar 17 '24 07:03 Beinsezii

FWIW, I think it makes sense to go through the section on from_single_file for the Stable Cascade pipeline docs to understand the scopes of its support.

It also helps to go through the Cascade conversion script for getting the fuller context: https://github.com/huggingface/diffusers/blob/main/scripts/convert_stable_cascade.py.

TL;DR is (IIUC) that the original checkpoints ship with either the prior or the decoder but not with the other components involved in the pipeline (unlike other models such as SVD, SDXL, etc.). This is why, from_single_file() support is implemented for prior and decoder.

However, I also invite @kashif and @DN6 to correct me if I am completely wrong here.

Mar 17 '24 15:03 sayakpaul

Ah, my bad I scrolled right passed that. I wasn't expecting "single file" to be only the unet, especially when it's already present as a "single file" in the decoder/ folder?

Swapping the unet with the file @yiyixuxu linked and it's very clearly the source of the issue. Using the following script

import torch
from math import ceil
from diffusers import (
    StableCascadeDecoderPipeline,
    StableCascadePriorPipeline,
)
from diffusers.models import StableCascadeUNet

torch.set_grad_enabled(False)

prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", torch_dtype=torch.bfloat16).to("cuda")

# CPU noise for cross-machine reproducibility
size = [
    1,
    prior.prior.config.in_channels,
    ceil(1024 / prior.config.resolution_multiple),
    ceil(1024 / prior.config.resolution_multiple),
]
generator = torch.Generator("cpu").manual_seed(-2060472805)
prior_out = prior(
    prompt="photorealistic portrait artwork of an floral robot with a dark night cyberpunk city background",
    negative_prompt="",
    num_inference_steps=50,
    guidance_scale=3.0,
    width=1024,
    height=1024,
    latents=torch.randn(size, generator=generator, dtype=torch.float32, device="cpu").to(torch.bfloat16),
    generator=generator,
)
del prior
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", torch_dtype=torch.bfloat16).to("cuda")
dgen2 = torch.Generator("cpu")
dgen2.set_state(generator.get_state())

decoder(num_inference_steps=50, generator=generator, **prior_out).images[0].save("/tmp/cascade_normal.png")

decoder.decoder = StableCascadeUNet.from_single_file("/tmp/stage_b.safetensors", torch_dtype=torch.bfloat16).to("cuda")

decoder(num_inference_steps=50, generator=dgen2, **prior_out).images[0].save("/tmp/cascade_single.png")

It 1:1 reproduces both the good image after the PR50 partial revert cascade_normal and the smeary image I raised in this issue when you swap the unet cascade_single

Mar 17 '24 22:03 Beinsezii

Cc: @DN6 ^.

I wasn't expecting "single file" to be only the unet, especially when it's already present as a "single file" in the decoder/ folder?

My understanding is that even then the single-file checkpoints don't contain all the necessary components (i.e., image encoder, VQGAN, etc.).

Mar 18 '24 02:03 sayakpaul

@Beinsezii currently from_single_file is also used to load models in their original format. I realise the name can be a bit confusing in that regard. We're working to move all this logic into from_pretrained for future versions of diffusers.

We can look into pipeline single file loading for Cascade. Although I don't know how practical that is given the size of the model. Do you happen to have links to single file versions of Cascade that bundle all the model components that we can test with? The original model was shipped as individual components, so we didn't add pipeline support for single file out the gate.

Mar 18 '24 05:03 DN6

@DN6 I think your carbon copy was intended for the single file unet producing bad results compared to the diffusers format one, which led to YiYi reverting the weights on the cascade HF.

Do you happen to have links to single file versions of Cascade that bundle all the model components that we can test with?

The comfyui checkpoints in main bundle clip+prior into one safetensor and decoder+vqgan into another. Personally I'm not worried about it, the only reason I even looked at the single files is because YiYi suggested they may suffer from that smeary output issue.

Mar 18 '24 20:03 Beinsezii

diffusers diffusers copied to clipboard

Cascade w/ new varnames produces funky images.

Describe the bug

Reproduction

Logs

System Info

Who can help?

diffusers
diffusers copied to clipboard