diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Mixture-of-Experts partial diffusion implementation for base SD 1.x / 2.x pipelines

Open bghira opened this issue 1 year ago • 32 comments

What does this PR do?

This pull request ports our denoising_start code to the text2img pipeline, and the denoising_start and denoising_end code from the img2img pipeline.

This brings legacy SD model capabilities in line with SDXL.

Enhances #4003

Example

Using on Stable Diffusion 2.1 fine-tuned with zero terminal SNR:

image image image image

Before submitting

  • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [ ] Did you read the contributor guideline?
  • [ ] Did you read our philosophy doc (important for complex PRs)?
  • [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • [ ] Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

bghira avatar Jul 29 '23 06:07 bghira

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@patrickvonplaten making sure this one doesn't get lost

bghira avatar Aug 02 '23 14:08 bghira

Very cool addition! Could we maybe add some tests? Think they can be very similar to the ones we added here: https://github.com/huggingface/diffusers/blob/d0b8de1262ba785474fc9df53c29ba44ec02c715/tests/pipelines/stable_diffusion_xl/test_stable_diffusion_xl.py#L264 (feel free to copy-paste)

patrickvonplaten avatar Aug 03 '23 19:08 patrickvonplaten

Let us know if you need any help with the tests or currently failing tests @bghira :-)

patrickvonplaten avatar Aug 23 '23 21:08 patrickvonplaten

@patrickvonplaten sorry, i've been really busy testing SDXL training and haven't had time to follow-up here. i would be glad for the assist!

bghira avatar Aug 23 '23 21:08 bghira

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Oct 18 '23 15:10 github-actions[bot]

@patrickvonplaten

bghira avatar Oct 18 '23 15:10 bghira

@bghira for which SD 1.x / SD 2.x models does mixture of experts work well?

patrickvonplaten avatar Oct 28 '23 18:10 patrickvonplaten

the result above is actually from passing the partially diffused output from SD 2.1, through SDXL.

bghira avatar Oct 28 '23 23:10 bghira

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Nov 22 '23 15:11 github-actions[bot]

@yiyixuxu WDYT?

sayakpaul avatar Nov 27 '23 04:11 sayakpaul

I don't think we need to prio this PR at the moment

patrickvonplaten avatar Nov 27 '23 13:11 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Dec 26 '23 15:12 github-actions[bot]

come on

bghira avatar Jan 19 '24 02:01 bghira

@bghira sorry to be so late here. Could you maybe post a code snippet showing how this PR enables high quality images with Mixture Of Experts for SD 1.x and 2.x? Given that we don't have 1.x or 2.x denoiser checkpoint I'm a bit unsure whether this is needed yet to be honest.

patrickvonplaten avatar Jan 19 '24 10:01 patrickvonplaten

my trainer can make them, and my 2.x checkpoints can make use of this. i showed images above. it sounds like you really just dont want this. you can reject it, people can just use some other library or toolkit to make it happen

bghira avatar Jan 19 '24 13:01 bghira

fwiw this even allows using the sdxl refiner to complete inference on sd 1.5 or 2.x

bghira avatar Jan 19 '24 14:01 bghira

fwiw this even allows using the sdxl refiner to complete inference on sd 1.5 or 2.x

Can you add a quick code snippet for this?

patrickvonplaten avatar Jan 23 '24 11:01 patrickvonplaten

from diffusers import DiffusionPipeline
import torch

base = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda")

refiner = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
).to("cuda")

prompt = "A majestic lion jumping from a big stone at night"

base_image = base(
    prompt=prompt,
    num_inference_steps=40,
    denoising_end=0.8,
    output_type="pil",
).images[0]
image = refiner(
    prompt=prompt,
    num_inference_steps=40,
    denoising_start=0.8,
    image=base_image,
).images[0]
base_image.save('base_image.png', format='PNG')
image.save('image.png', format='PNG')

bghira avatar Jan 23 '24 18:01 bghira

to use a SD 2.x zero-terminal SNR checkpoint that is finetuned for steps 0-400:

from diffusers import DiffusionPipeline
import torch

base = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
).to("cuda")

refiner = DiffusionPipeline.from_pretrained(
    "ptx0/pseudo-flex-base",
    torch_dtype=torch.float16,
    use_safetensors=True,
).to("cuda")

prompt = "A majestic lion jumping from a big stone at night"

base_image = base(
    prompt=prompt,
    num_inference_steps=40,
    denoising_end=0.8,
    output_type="latent",
).images
image = refiner(
    prompt=prompt,
    num_inference_steps=40,
    denoising_start=0.8,
    image=base_image,
).images[0]
base_image.save('base_image.png', format='PNG')
image.save('image.png', format='PNG')

bghira avatar Jan 23 '24 19:01 bghira

I'm not getting great results with https://github.com/huggingface/diffusers/pull/4355#issuecomment-1906728628 and https://github.com/huggingface/diffusers/pull/4355#issuecomment-1906736194 doesn't work on this branch for me.

I'm ok adding it though, it makes sense to have this functionality when we already have some refiner model for SD2 1 trained.

patrickvonplaten avatar Feb 09 '24 16:02 patrickvonplaten

@yiyixuxu could you give this a review?

patrickvonplaten avatar Feb 09 '24 16:02 patrickvonplaten

ah, yeah to be fair i haven't pulled this branch in some time, it's not very easy for me to test this stuff locally as i'm in central america and downloading these models takes a very long time, if it completes at all.

once it is in though, i can revisit it and put some compute toward training a specific refiner for this case.

bghira avatar Feb 09 '24 17:02 bghira

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Mar 07 '24 15:03 github-actions[bot]

Not stale.

sayakpaul avatar Mar 07 '24 16:03 sayakpaul

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Apr 02 '24 15:04 github-actions[bot]

@bghira do we still want to support this? happy to help with the PR but it only makes sense if we are going to have the checkpoints though

yiyixuxu avatar Apr 03 '24 03:04 yiyixuxu

yes but any sd model can be an expert. pick one for composition and one for details or style and split the job between them

bghira avatar Apr 03 '24 12:04 bghira

@bghira

would you be able to provide examples so we can make a doc page for this?

Code-wise, I think there's not much left to finish up. Do you want to finish it up, or would you prefer that I take this over this PR?

yiyixuxu avatar Apr 03 '24 23:04 yiyixuxu

@yiyixuxu actually i'll be able to put more effort forth on this soon, thanks for the patience

bghira avatar Apr 27 '24 13:04 bghira