diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

examples/community/lpw_stable_diffusion_xl.py Not correctly decoded

Open HACLINE opened this issue 9 months ago • 1 comments

Describe the bug

When I used lpw_stable_diffusion_xl on a text2img model (playgroundai/playground-v2.5-1024px-aesthetic), I found the image gray, seemed not decoded correctly. Zero Two

To solve this I copied some codes from StableDiffusionXLPipeline and replaced line 1889.

# image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dict=False)[0]  This is the original one 
has_latents_mean = hasattr(self.vae.config, "latents_mean") and self.vae.config.latents_mean is not None
has_latents_std = hasattr(self.vae.config, "latents_std") and self.vae.config.latents_std is not None
if has_latents_mean and has_latents_std:
    latents_mean = (
        torch.tensor(self.vae.config.latents_mean).view(1, 4, 1, 1).to(latents.device, latents.dtype)
    )
    latents_std = (
        torch.tensor(self.vae.config.latents_std).view(1, 4, 1, 1).to(latents.device, latents.dtype)
    )
    latents = latents * latents_std / self.vae.config.scaling_factor + latents_mean
else:
    latents = latents / self.vae.config.scaling_factor

image = self.vae.decode(latents, return_dict=False)[0]

This is a naiive fix, so I'm not sure whether it works in other cases.

Reproduction

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "playgroundai/playground-v2.5-1024px-aesthetic", # perhaps you can change to other text2img to reproduce?
    custom_pipeline = "lpw_stable_diffusion_xl",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

prompt = "Create a detailed and refined image of Zero Two from the anime Darling in the Franxx. She is known for her distinctive pink hair and mesmerizing green eyes. She should be depicted in a dynamic pose, showcasing her strong and fearless personality. The image should be in anime style, with an 8k resolution and a 16:9 aspect ratio. The background should be a battlefield, symbolizing the constant fights she has to face. Despite the harsh environment, she maintains a confident and determined expression. The background should be black." # This one generated by copilot, don't focus on this:)

image = pipe(prompt=prompt, num_inference_steps=50, guidance_scale=3).images[0]

image.save("Zero Two.png")

Logs

No response

System Info

  • diffusers version: 0.27.0
  • Platform: Linux-5.15.0-102-generic-x86_64-with-glibc2.31
  • Python version: 3.10.13
  • PyTorch version (GPU?): 2.1.1 (True)
  • Huggingface_hub version: 0.21.3
  • Transformers version: 4.38.1
  • Accelerate version: 0.27.2
  • xFormers version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

No response

HACLINE avatar May 02 '24 17:05 HACLINE

Hi, thank you for reporting the issue.

Those changes are needed for playground-v2.5 to work. Since the community pipelines are maintained by the community, I think most of them won't work with that model right now until someone updates them (usually the original contributors).

Maybe you can tag them or you can open a PR yourself if you want to contribute.

asomoza avatar May 03 '24 03:05 asomoza

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Sep 14 '24 15:09 github-actions[bot]