What does this PR do?

We add a set of pipelines implementing the LEDITS++ image editing method. We provide an implementation for StableDiffusion, SD-XL and DeepFloyd-IF. Additionally, we made some minor adjustments to the DPM-Solver scheduler to support image inversion.

There are still some obvious TODOs left that we would appreciate some help with:

[x] Finalize IF implementation
[x] Adjust input checks to new, image editing setting (e.g. fail gracefully if inference is called before inversion)
[x] Add documentation
[x] Write unit tests

@patrickvonplaten, @apolinario, @linoytsaban you should all be able to commit to our fork. Would be great if you could help @kathath and I out a bit 😄

Who can review?

@patrickvonplaten

Dec 06 '23 11:12 manuelbrack

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Dec 07 '23 12:12 HuggingFaceDocBuilderDev

@kathath , @manuelbrack Yay! can we move it out of draft? 🔥😁

Jan 04 '24 16:01 linoytsaban

maybe @sayakpaul can take a look? 😄

Jan 12 '24 12:01 linoytsaban

Can we give this another review (cc @yiyixuxu and also maybe @DN6 )

Feb 13 '24 15:02 patrickvonplaten

this looks very promising, few comments to polish it for production...

cannot make either sd or sdxl pipelines work with model cpu offloading
as it doesnt pull text_encoder from cpu to gpu on invert()
LEditsPPPipelineStableDiffusionXL fails in invert().encode_image when running in fp16
since VAE by default gets upcast causing float vs half runtime error
workaround: pipe.vae.config.force_upcast = False
pipelines do not work with GPU torch.Generator()
since tensor never gets moved to same device as generator created noise:
```
xts[idx] = self.scheduler.add_noise(x0, noise, torch.Tensor([t]))
```
SDXL pipeline does not take width/height/clip_skip args like SD pipeline does
both pipelines do not set param pipe.num_timesteps which makes it hard to check progress in callbacks, etc.
should be as simple as pipe.num_timesteps = len(pipe.inversion_steps)
guidance_rescale > 0 causes runtime error since noise_pred_edit_concepts is a tuple,
not ndarray, so .mean() does not work
if source_guidance_scale is <=1, it causes runtime errors instead of disabling guidance
math for edit_cooldown_steps seems strange as it uses absolute values, not relative to timesteps
generally, not sure why explicit invert() is even needed,
it can easily be combined in main pipeline __call__ method
especially since its output is discarded and it sets values on pipeline directly
there is a note in docs https://huggingface.co/docs/diffusers/main/en/api/pipelines/ledits_pp that dpm++ 2m in diffusers is sub-optimal, but no info on why or example outputs?

Mar 13 '24 18:03 vladmandic

thanks for the feedback! @vladmandic

@manuelbrack can we address them in the refactor PR too?

Mar 13 '24 18:03 yiyixuxu

one more item:

pipeline fails in aggregate_attention on non-standard input image sizes, for example 872x512 or even simple 560x560

Mar 14 '24 13:03 vladmandic

diffusers
diffusers copied to clipboard

[Pipeline] Add LEDITS++ pipelines

What does this PR do?

Who can review?

diffusers diffusers copied to clipboard

[Pipeline] Add LEDITS++ pipelines

What does this PR do?

Who can review?

diffusers
diffusers copied to clipboard