diffusers
diffusers copied to clipboard
Mixture-of-Experts partial diffusion implementation for base SD 1.x / 2.x pipelines
What does this PR do?
This pull request ports our denoising_start code to the text2img pipeline, and the denoising_start and denoising_end code from the img2img pipeline.
This brings legacy SD model capabilities in line with SDXL.
Enhances #4003
Example
Using on Stable Diffusion 2.1 fine-tuned with zero terminal SNR:
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you read the contributor guideline?
- [ ] Did you read our philosophy doc (important for complex PRs)?
- [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
- [ ] Did you write any new necessary tests?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.
@patrickvonplaten making sure this one doesn't get lost
Very cool addition! Could we maybe add some tests? Think they can be very similar to the ones we added here: https://github.com/huggingface/diffusers/blob/d0b8de1262ba785474fc9df53c29ba44ec02c715/tests/pipelines/stable_diffusion_xl/test_stable_diffusion_xl.py#L264 (feel free to copy-paste)
Let us know if you need any help with the tests or currently failing tests @bghira :-)
@patrickvonplaten sorry, i've been really busy testing SDXL training and haven't had time to follow-up here. i would be glad for the assist!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@patrickvonplaten
@bghira for which SD 1.x / SD 2.x models does mixture of experts work well?
the result above is actually from passing the partially diffused output from SD 2.1, through SDXL.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@yiyixuxu WDYT?
I don't think we need to prio this PR at the moment
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
come on
@bghira sorry to be so late here. Could you maybe post a code snippet showing how this PR enables high quality images with Mixture Of Experts for SD 1.x and 2.x? Given that we don't have 1.x or 2.x denoiser checkpoint I'm a bit unsure whether this is needed yet to be honest.
my trainer can make them, and my 2.x checkpoints can make use of this. i showed images above. it sounds like you really just dont want this. you can reject it, people can just use some other library or toolkit to make it happen
fwiw this even allows using the sdxl refiner to complete inference on sd 1.5 or 2.x
fwiw this even allows using the sdxl refiner to complete inference on sd 1.5 or 2.x
Can you add a quick code snippet for this?
from diffusers import DiffusionPipeline
import torch
base = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda")
refiner = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-refiner-1.0",
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16",
).to("cuda")
prompt = "A majestic lion jumping from a big stone at night"
base_image = base(
prompt=prompt,
num_inference_steps=40,
denoising_end=0.8,
output_type="pil",
).images[0]
image = refiner(
prompt=prompt,
num_inference_steps=40,
denoising_start=0.8,
image=base_image,
).images[0]
base_image.save('base_image.png', format='PNG')
image.save('image.png', format='PNG')
to use a SD 2.x zero-terminal SNR checkpoint that is finetuned for steps 0-400:
from diffusers import DiffusionPipeline
import torch
base = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda")
refiner = DiffusionPipeline.from_pretrained(
"ptx0/pseudo-flex-base",
torch_dtype=torch.float16,
use_safetensors=True,
).to("cuda")
prompt = "A majestic lion jumping from a big stone at night"
base_image = base(
prompt=prompt,
num_inference_steps=40,
denoising_end=0.8,
output_type="latent",
).images
image = refiner(
prompt=prompt,
num_inference_steps=40,
denoising_start=0.8,
image=base_image,
).images[0]
base_image.save('base_image.png', format='PNG')
image.save('image.png', format='PNG')
I'm not getting great results with https://github.com/huggingface/diffusers/pull/4355#issuecomment-1906728628 and https://github.com/huggingface/diffusers/pull/4355#issuecomment-1906736194 doesn't work on this branch for me.
I'm ok adding it though, it makes sense to have this functionality when we already have some refiner model for SD2 1 trained.
@yiyixuxu could you give this a review?
ah, yeah to be fair i haven't pulled this branch in some time, it's not very easy for me to test this stuff locally as i'm in central america and downloading these models takes a very long time, if it completes at all.
once it is in though, i can revisit it and put some compute toward training a specific refiner for this case.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Not stale.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@bghira do we still want to support this? happy to help with the PR but it only makes sense if we are going to have the checkpoints though
yes but any sd model can be an expert. pick one for composition and one for details or style and split the job between them
@bghira
would you be able to provide examples so we can make a doc page for this?
Code-wise, I think there's not much left to finish up. Do you want to finish it up, or would you prefer that I take this over this PR?
@yiyixuxu actually i'll be able to put more effort forth on this soon, thanks for the patience