diffusers
diffusers copied to clipboard
[Pipeline] Add LEDITS++ pipelines
What does this PR do?
We add a set of pipelines implementing the LEDITS++ image editing method. We provide an implementation for StableDiffusion, SD-XL and DeepFloyd-IF. Additionally, we made some minor adjustments to the DPM-Solver scheduler to support image inversion.
There are still some obvious TODOs left that we would appreciate some help with:
- [x] Finalize IF implementation
- [x] Adjust input checks to new, image editing setting (e.g. fail gracefully if inference is called before inversion)
- [x] Add documentation
- [x] Write unit tests
@patrickvonplaten, @apolinario, @linoytsaban you should all be able to commit to our fork. Would be great if you could help @kathath and I out a bit 😄
Who can review?
@patrickvonplaten
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
@kathath , @manuelbrack Yay! can we move it out of draft? 🔥😁
maybe @sayakpaul can take a look? 😄
Can we give this another review (cc @yiyixuxu and also maybe @DN6 )
this looks very promising, few comments to polish it for production...
- cannot make either sd or sdxl pipelines work with model cpu offloading
as it doesnt pull text_encoder from cpu to gpu oninvert()
- LEditsPPPipelineStableDiffusionXL fails in invert().encode_image when running in fp16
since VAE by default gets upcast causing float vs half runtime error
workaround:pipe.vae.config.force_upcast = False
- pipelines do not work with GPU torch.Generator()
since tensor never gets moved to same device as generator created noise:xts[idx] = self.scheduler.add_noise(x0, noise, torch.Tensor([t]))
- SDXL pipeline does not take width/height/clip_skip args like SD pipeline does
- both pipelines do not set param
pipe.num_timesteps
which makes it hard to check progress in callbacks, etc.
should be as simple aspipe.num_timesteps = len(pipe.inversion_steps)
-
guidance_rescale > 0
causes runtime error sincenoise_pred_edit_concepts
is a tuple,
not ndarray, so.mean()
does not work - if
source_guidance_scale
is <=1, it causes runtime errors instead of disabling guidance - math for
edit_cooldown_steps
seems strange as it uses absolute values, not relative to timesteps - generally, not sure why explicit
invert()
is even needed,
it can easily be combined in main pipeline__call__
method
especially since its output is discarded and it sets values on pipeline directly - there is a note in docs https://huggingface.co/docs/diffusers/main/en/api/pipelines/ledits_pp that dpm++ 2m in diffusers is sub-optimal, but no info on why or example outputs?
thanks for the feedback! @vladmandic
@manuelbrack can we address them in the refactor PR too?
one more item:
- pipeline fails in
aggregate_attention
on non-standard input image sizes, for example 872x512 or even simple 560x560