diffusers
diffusers copied to clipboard
Add Kohya fix to SD pipeline for high resolution generation
What does this PR do?
Adds Kohya fix to Stable Diffusion pipeline. Fixes https://github.com/huggingface/diffusers/issues/7265.
To Test?
Here is a minimal example to test the pipeline. You can disable the fix by setting with_high_res_fix to False which passes None to the pipline as high_res_fix argument.
from diffusers.pipelines.stable_diffusion import StableDiffusionHighResFixPipeline
import torch
generator = torch.manual_seed(42)
with_high_res_fix = True
high_res_fix = [{'timestep': 600, 'scale_factor': 0.5, 'block_num': 1}]
pipe = StableDiffusionHighResFixPipeline.from_pretrained("CompVis/stable-diffusion-v1-4",
generator=generator,
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16",
high_res_fix=high_res_fix if with_high_res_fix else None)
pipe.to("cuda")
prompt = "a dog sitting on the couch"
image = pipe(prompt=prompt,
height=1000,
width=1600,
num_inference_steps=50).images[0]
image.save(f"{prompt.replace(' ', '_')}_fix={with_high_res_fix}.png")
high_res_fix argument is supposed to be a list of dictionaries where each element has three values of timestep, scale_factor, and block_num. For example, you can pass a high_res_fix of [{'timestep': 900, 'scale_factor': 0.4, 'block_num': 2}, {'timestep': 600, 'scale_factor': 0.5, 'block_num': 1}]
.
I find the default value of [{'timestep': 600, 'scale_factor': 0.5, 'block_num': 1}]
to work well enough, but the user can modify based on their use-case.
Here are some examples with a resolution of 1000x1600:
Prompt = "a dog sitting on the couch"
without the fix:
with the fix:
Prompt = "a pig sitting behind the desk"
without the fix:
with the fix:
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [x] Did you read the contributor guideline?
- [x] Did you read our philosophy doc (important for complex PRs)?
- [x] Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [ ] Did you write any new necessary tests?
Who can review?
@yiyixuxu @sayakpaul
Thanks for this PR @sajadn ! This is very helpful.
thanks! can we move this to community folder?
You mean moving the new pipeline to examples/community folder? how about unet_2d_condition_high_res?
it can go into the same file :)
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
gentle pin :)
Hey, sorry about the long delay! I put everything in a single file named kohya_hires_fix.py in examples/community folder. I also updated the "how to test?" section in the thread above to use custom_pipelines. Let me know if it still needs adjustments. Cheers!
thanks so much for the contribution!! cc @asomoza here for awareness: another tool in our toolbox! let's recommend it whenever you see fit and keep an eye on it for official integration :)
@sajadn I've run your code, but the Fix doesn't seem to work. Here's the code i'm running : from diffusers import DiffusionPipeline import torch
with_high_res_fix = True high_res_fix = [{'timestep': 600, 'scale_factor': 0.5, 'block_num': 1}] pipe = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", custom_pipeline="/content/kohya_hires_fix.py", torch_dtype=torch.float16, use_safetensors=True, variant="fp16", high_res_fix=high_res_fix if with_high_res_fix else None) pipe.to("cuda") prompt = "a dog sitting on the couch" image = pipe(prompt=prompt, height=1024, width=1024, num_inference_steps=30).images[0]
image.save(f"{prompt.replace(' ', '_')}_fix={with_high_res_fix}.jpg")
But here's the results
Am i do wrong?
I think you're expecting too much of this, most of the examples I see aren't that great and yours seems like most of the examples, probably you need to keep re-rolling until you get a good seed.
@Depfek6 I'd say play with the parameters. Decrease the scale_factor further to something like 0.25 or decrease the timestep to 500, you should be able to reduce the number of dog heads to one :D