What does this PR do?

Adds Kohya fix to Stable Diffusion pipeline. Fixes https://github.com/huggingface/diffusers/issues/7265.

To Test?

Here is a minimal example to test the pipeline. You can disable the fix by setting with_high_res_fix to False which passes None to the pipline as high_res_fix argument.

from diffusers.pipelines.stable_diffusion import StableDiffusionHighResFixPipeline
import torch

generator = torch.manual_seed(42)
with_high_res_fix = True
high_res_fix = [{'timestep': 600, 'scale_factor': 0.5, 'block_num': 1}]
pipe = StableDiffusionHighResFixPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", 
                                                         generator=generator,
                                                         torch_dtype=torch.float16, 
                                                         use_safetensors=True, 
                                                         variant="fp16",
                                                         high_res_fix=high_res_fix if with_high_res_fix else None)
pipe.to("cuda")
prompt = "a dog sitting on the couch"
image = pipe(prompt=prompt,
                height=1000,
                width=1600, 
                num_inference_steps=50).images[0]

image.save(f"{prompt.replace(' ', '_')}_fix={with_high_res_fix}.png")

high_res_fix argument is supposed to be a list of dictionaries where each element has three values of timestep, scale_factor, and block_num. For example, you can pass a high_res_fix of [{'timestep': 900, 'scale_factor': 0.4, 'block_num': 2}, {'timestep': 600, 'scale_factor': 0.5, 'block_num': 1}].

I find the default value of [{'timestep': 600, 'scale_factor': 0.5, 'block_num': 1}] to work well enough, but the user can modify based on their use-case.

Here are some examples with a resolution of 1000x1600: Prompt = "a dog sitting on the couch" without the fix: a_dog_sitting_on_the_couch_fix=False

with the fix: a_dog_sitting_on_the_couch_fix=True

Prompt = "a pig sitting behind the desk" without the fix: a_pig_sitting_behind_the_desk_fix=False

with the fix: a_pig_sitting_behind_the_desk_fix=True

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x] Did you read the contributor guideline?
[x] Did you read our philosophy doc (important for complex PRs)?
[x] Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
[ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
[ ] Did you write any new necessary tests?

Who can review?

@yiyixuxu @sayakpaul

Apr 10 '24 15:04 sajadn

Thanks for this PR @sajadn ! This is very helpful.

Apr 10 '24 19:04 srelbo

thanks! can we move this to community folder?

Apr 10 '24 21:04 yiyixuxu

You mean moving the new pipeline to examples/community folder? how about unet_2d_condition_high_res?

Apr 10 '24 22:04 sajadn

it can go into the same file :)

Apr 11 '24 00:04 yiyixuxu

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Apr 11 '24 18:04 HuggingFaceDocBuilderDev

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

May 11 '24 15:05 github-actions[bot]

gentle pin :)

May 13 '24 02:05 yiyixuxu

Hey, sorry about the long delay! I put everything in a single file named kohya_hires_fix.py in examples/community folder. I also updated the "how to test?" section in the thread above to use custom_pipelines. Let me know if it still needs adjustments. Cheers!

May 27 '24 16:05 sajadn

thanks so much for the contribution!! cc @asomoza here for awareness: another tool in our toolbox! let's recommend it whenever you see fit and keep an eye on it for official integration :)

May 28 '24 20:05 yiyixuxu

@sajadn I've run your code, but the Fix doesn't seem to work. Here's the code i'm running : from diffusers import DiffusionPipeline import torch

with_high_res_fix = True high_res_fix = [{'timestep': 600, 'scale_factor': 0.5, 'block_num': 1}] pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", custom_pipeline="/content/kohya_hires_fix.py", torch_dtype=torch.float16, use_safetensors=True, variant="fp16", high_res_fix=high_res_fix if with_high_res_fix else None) pipe.to("cuda") prompt = "a dog sitting on the couch" image = pipe(prompt=prompt, height=1024, width=1024, num_inference_steps=30).images[0]

image.save(f"{prompt.replace(' ', '_')}_fix={with_high_res_fix}.jpg")

But here's the results

a_dog_sitting_on_the_couch_fix=True (1)

Am i do wrong?

Jun 04 '24 01:06 Depfek6

I think you're expecting too much of this, most of the examples I see aren't that great and yours seems like most of the examples, probably you need to keep re-rolling until you get a good seed.

Jun 07 '24 02:06 asomoza

@Depfek6 I'd say play with the parameters. Decrease the scale_factor further to something like 0.25 or decrease the timestep to 500, you should be able to reduce the number of dog heads to one :D

Jun 11 '24 23:06 sajadn

diffusers
diffusers copied to clipboard

Add Kohya fix to SD pipeline for high resolution generation

What does this PR do?

To Test?

Before submitting

Who can review?

diffusers diffusers copied to clipboard

Add Kohya fix to SD pipeline for high resolution generation

What does this PR do?

To Test?

Before submitting

Who can review?

diffusers
diffusers copied to clipboard