diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Add Kohya fix to SD pipeline for high resolution generation

Open sajadn opened this issue 10 months ago • 7 comments

What does this PR do?

Adds Kohya fix to Stable Diffusion pipeline. Fixes https://github.com/huggingface/diffusers/issues/7265.

To Test?

Here is a minimal example to test the pipeline. You can disable the fix by setting with_high_res_fix to False which passes None to the pipline as high_res_fix argument.

from diffusers.pipelines.stable_diffusion import StableDiffusionHighResFixPipeline
import torch

generator = torch.manual_seed(42)
with_high_res_fix = True
high_res_fix = [{'timestep': 600, 'scale_factor': 0.5, 'block_num': 1}]
pipe = StableDiffusionHighResFixPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", 
                                                         generator=generator,
                                                         torch_dtype=torch.float16, 
                                                         use_safetensors=True, 
                                                         variant="fp16",
                                                         high_res_fix=high_res_fix if with_high_res_fix else None)
pipe.to("cuda")
prompt = "a dog sitting on the couch"
image = pipe(prompt=prompt,
                height=1000,
                width=1600, 
                num_inference_steps=50).images[0]

image.save(f"{prompt.replace(' ', '_')}_fix={with_high_res_fix}.png")

high_res_fix argument is supposed to be a list of dictionaries where each element has three values of timestep, scale_factor, and block_num. For example, you can pass a high_res_fix of [{'timestep': 900, 'scale_factor': 0.4, 'block_num': 2}, {'timestep': 600, 'scale_factor': 0.5, 'block_num': 1}].

I find the default value of [{'timestep': 600, 'scale_factor': 0.5, 'block_num': 1}] to work well enough, but the user can modify based on their use-case.

Here are some examples with a resolution of 1000x1600: Prompt = "a dog sitting on the couch" without the fix: a_dog_sitting_on_the_couch_fix=False

with the fix: a_dog_sitting_on_the_couch_fix=True

Prompt = "a pig sitting behind the desk" without the fix: a_pig_sitting_behind_the_desk_fix=False

with the fix: a_pig_sitting_behind_the_desk_fix=True

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [x] Did you read the contributor guideline?
  • [x] Did you read our philosophy doc (important for complex PRs)?
  • [x] Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
  • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [ ] Did you write any new necessary tests?

Who can review?

@yiyixuxu @sayakpaul

sajadn avatar Apr 10 '24 15:04 sajadn

Thanks for this PR @sajadn ! This is very helpful.

srelbo avatar Apr 10 '24 19:04 srelbo

thanks! can we move this to community folder?

yiyixuxu avatar Apr 10 '24 21:04 yiyixuxu

You mean moving the new pipeline to examples/community folder? how about unet_2d_condition_high_res?

sajadn avatar Apr 10 '24 22:04 sajadn

it can go into the same file :)

yiyixuxu avatar Apr 11 '24 00:04 yiyixuxu

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar May 11 '24 15:05 github-actions[bot]

gentle pin :)

yiyixuxu avatar May 13 '24 02:05 yiyixuxu

Hey, sorry about the long delay! I put everything in a single file named kohya_hires_fix.py in examples/community folder. I also updated the "how to test?" section in the thread above to use custom_pipelines. Let me know if it still needs adjustments. Cheers!

sajadn avatar May 27 '24 16:05 sajadn

thanks so much for the contribution!! cc @asomoza here for awareness: another tool in our toolbox! let's recommend it whenever you see fit and keep an eye on it for official integration :)

yiyixuxu avatar May 28 '24 20:05 yiyixuxu

@sajadn I've run your code, but the Fix doesn't seem to work. Here's the code i'm running : from diffusers import DiffusionPipeline import torch

with_high_res_fix = True high_res_fix = [{'timestep': 600, 'scale_factor': 0.5, 'block_num': 1}] pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", custom_pipeline="/content/kohya_hires_fix.py", torch_dtype=torch.float16, use_safetensors=True, variant="fp16", high_res_fix=high_res_fix if with_high_res_fix else None) pipe.to("cuda") prompt = "a dog sitting on the couch" image = pipe(prompt=prompt, height=1024, width=1024, num_inference_steps=30).images[0]

image.save(f"{prompt.replace(' ', '_')}_fix={with_high_res_fix}.jpg")

But here's the results

a_dog_sitting_on_the_couch_fix=True (1)

Am i do wrong?

Depfek6 avatar Jun 04 '24 01:06 Depfek6

I think you're expecting too much of this, most of the examples I see aren't that great and yours seems like most of the examples, probably you need to keep re-rolling until you get a good seed.

asomoza avatar Jun 07 '24 02:06 asomoza

@Depfek6 I'd say play with the parameters. Decrease the scale_factor further to something like 0.25 or decrease the timestep to 500, you should be able to reduce the number of dog heads to one :D

sajadn avatar Jun 11 '24 23:06 sajadn