diffusers
diffusers copied to clipboard
Cascade w/ new varnames produces funky images.
Describe the bug
Using Stable Cascade with the diffusers main branch and the HF PRs 2/44 as intended, it appears as though there's some misconfiguration that results in super funky images as shown.
Old Cascade branch before the variable renaming, using the current main Cascade revisions
Diffusers master branch w/ the updated Cascade revision PRs
Reproduction
import torch
from math import ceil
from diffusers import (
StableCascadeCombinedPipeline,
StableCascadeDecoderPipeline,
StableCascadePriorPipeline,
)
torch.set_grad_enabled(False)
prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", revision="refs/pr/2", torch_dtype=torch.bfloat16)
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", revision="refs/pr/44", torch_dtype=torch.bfloat16)
pipe = StableCascadeCombinedPipeline(
decoder.tokenizer,
decoder.text_encoder,
decoder.decoder,
decoder.scheduler,
decoder.vqgan,
prior.prior,
prior.text_encoder,
prior.tokenizer,
prior.scheduler,
prior.feature_extractor,
prior.image_encoder,
)
del prior, decoder
pipe = pipe.to("cuda")
# CPU noise for cross-machine reproducibility
size = [
1,
pipe.prior_pipe.prior.config.in_channels,
ceil(1024 / pipe.prior_pipe.config.resolution_multiple),
ceil(1024 / pipe.prior_pipe.config.resolution_multiple),
]
generator = torch.Generator("cpu").manual_seed(-2060472805)
latent_input = torch.randn(size, generator=generator, dtype=torch.float32, device="cpu").to(torch.bfloat16)
pipe(
prompt="photorealistic portrait artwork of an floral robot with a dark night cyberpunk city background",
negative_prompt="",
num_inference_steps=50,
prior_num_inference_steps=50,
prior_guidance_scale=3.0,
latents=latent_input,
generator=generator,
width=1024,
height=1024,
).images[0].save("/tmp/cascade_repro.png")
Logs
No response
System Info
Diffusers master, torch 2.3.0+ROCm-6.0
Who can help?
No response
Cc: @DN6 @kashif
It looks like #7287 fixed the coloration but not the smeary-ness.
Original
Diffusers 0.27
Made with the same script, only changing the diffusers version and cascade refs
@Beinsezii
I swapped back to the previous checkpoint here https://huggingface.co/stabilityai/stable-cascade/discussions/50 can you test out to see if this checkpoint is better?
import torch
from diffusers import StableCascadeDecoderPipeline
from diffusers.utils.testing_utils import load_pt
torch.set_grad_enabled(False)
prompt="photorealistic portrait artwork of an floral robot with a dark night cyberpunk city background"
negative_prompt=""
path_prior_out = "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/sd_cascade_image_embeds.ipadpt"
image_embeddings = load_pt(path_prior_out)
# current checkpoint
decoder = StableCascadeDecoderPipeline.from_pretrained(
"stabilityai/stable-cascade",
variant="bf16",
torch_dtype=torch.bfloat16)
decoder.enable_model_cpu_offload()
generator = torch.Generator("cpu").manual_seed(-2060472805)
decoder_output = decoder(
image_embeddings=image_embeddings,
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=50,
guidance_scale= 0.0,
generator=generator,
)
decoder_output.images[0].save("yiyi_test_9_out_current.png")
# with the new checkpoint
decoder = StableCascadeDecoderPipeline.from_pretrained(
"stabilityai/stable-cascade",
variant="bf16",
revision="refs/pr/50",
torch_dtype=torch.bfloat16)
decoder.enable_model_cpu_offload()
generator = torch.Generator("cpu").manual_seed(-2060472805)
decoder_output = decoder(
image_embeddings=image_embeddings,
prompt=prompt,
negative_prompt=negative_prompt,
num_inference_steps=50,
guidance_scale= 0.0,
generator=generator,
)
decoder_output.images[0].save("yiyi_test_9_out_new.png")
current outpput
new output with the PR
@yiyixuxu you madwoman, that's exactly it. The robot lines up 1:1 with the original cascade PR now.
I rendered a few other prompts demonstrating the problem. Current weights vs PR 50 reverted weights.
The robot flowers, frog hands, and armor plate decorations are all much more artifacted / smeary on the current weights.
What even are the 'new' weights that were introduced in the PR 44? All the commit says is "update diffusers format".
Reminds me of when XL 1.0 first released and the VAE weights were updated to a version that produces artifacts, needing to be reverted.
Should I close now or wait for a HF merge?
@Beinsezii it's merged!
@Beinsezii
the new weights in PR 44 were converted from https://huggingface.co/stabilityai/stable-cascade/blob/main/stage_b.safetensors - so I think the issue still exists if you use from_single_file; I also think the comfyUI checkpoints have the same issue
What even are the 'new' weights that were introduced in the PR 44? All the commit says is "update diffusers format".
None of the Cascade pipes implement the FromSingleFileMixin trait in the first place, ~~and the official ComfyUI Cascade example errs on unet load for me~~ so further validations are going to be fun...
Edit: I didn't realize Comfy had it's own entirely separate single file checkpoints. I'll see how reproducible it is after the movie.
ComfyUI uses what looks like totally different sampling for Cascade which means its not seed compatible.
However, the robot flowers and the dog paws/clothes have some of that sorta smeary look that the diffusers weights used to have before you merged, so while not completely conclusive I think you're right.
Should probably have a note to be investigated if/when someone adds the single file mixin to Cascade.
FWIW, I think it makes sense to go through the section on from_single_file for the Stable Cascade pipeline docs to understand the scopes of its support.
It also helps to go through the Cascade conversion script for getting the fuller context: https://github.com/huggingface/diffusers/blob/main/scripts/convert_stable_cascade.py.
TL;DR is (IIUC) that the original checkpoints ship with either the prior or the decoder but not with the other components involved in the pipeline (unlike other models such as SVD, SDXL, etc.). This is why, from_single_file() support is implemented for prior and decoder.
However, I also invite @kashif and @DN6 to correct me if I am completely wrong here.
Ah, my bad I scrolled right passed that. I wasn't expecting "single file" to be only the unet, especially when it's already present as a "single file" in the decoder/ folder?
Swapping the unet with the file @yiyixuxu linked and it's very clearly the source of the issue. Using the following script
import torch
from math import ceil
from diffusers import (
StableCascadeDecoderPipeline,
StableCascadePriorPipeline,
)
from diffusers.models import StableCascadeUNet
torch.set_grad_enabled(False)
prior = StableCascadePriorPipeline.from_pretrained("stabilityai/stable-cascade-prior", torch_dtype=torch.bfloat16).to("cuda")
# CPU noise for cross-machine reproducibility
size = [
1,
prior.prior.config.in_channels,
ceil(1024 / prior.config.resolution_multiple),
ceil(1024 / prior.config.resolution_multiple),
]
generator = torch.Generator("cpu").manual_seed(-2060472805)
prior_out = prior(
prompt="photorealistic portrait artwork of an floral robot with a dark night cyberpunk city background",
negative_prompt="",
num_inference_steps=50,
guidance_scale=3.0,
width=1024,
height=1024,
latents=torch.randn(size, generator=generator, dtype=torch.float32, device="cpu").to(torch.bfloat16),
generator=generator,
)
del prior
decoder = StableCascadeDecoderPipeline.from_pretrained("stabilityai/stable-cascade", torch_dtype=torch.bfloat16).to("cuda")
dgen2 = torch.Generator("cpu")
dgen2.set_state(generator.get_state())
decoder(num_inference_steps=50, generator=generator, **prior_out).images[0].save("/tmp/cascade_normal.png")
decoder.decoder = StableCascadeUNet.from_single_file("/tmp/stage_b.safetensors", torch_dtype=torch.bfloat16).to("cuda")
decoder(num_inference_steps=50, generator=dgen2, **prior_out).images[0].save("/tmp/cascade_single.png")
It 1:1 reproduces both the good image after the PR50 partial revert
and the smeary image I raised in this issue when you swap the unet
Cc: @DN6 ^.
I wasn't expecting "single file" to be only the unet, especially when it's already present as a "single file" in the decoder/ folder?
My understanding is that even then the single-file checkpoints don't contain all the necessary components (i.e., image encoder, VQGAN, etc.).
@Beinsezii currently from_single_file is also used to load models in their original format. I realise the name can be a bit confusing in that regard. We're working to move all this logic into from_pretrained for future versions of diffusers.
We can look into pipeline single file loading for Cascade. Although I don't know how practical that is given the size of the model. Do you happen to have links to single file versions of Cascade that bundle all the model components that we can test with? The original model was shipped as individual components, so we didn't add pipeline support for single file out the gate.
@DN6 I think your carbon copy was intended for the single file unet producing bad results compared to the diffusers format one, which led to YiYi reverting the weights on the cascade HF.
Do you happen to have links to single file versions of Cascade that bundle all the model components that we can test with?
The comfyui checkpoints in main bundle clip+prior into one safetensor and decoder+vqgan into another. Personally I'm not worried about it, the only reason I even looked at the single files is because YiYi suggested they may suffer from that smeary output issue.