diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

[Pipeline] Add LEDITS++ pipelines

Open manuelbrack opened this issue 1 year ago • 4 comments

What does this PR do?

We add a set of pipelines implementing the LEDITS++ image editing method. We provide an implementation for StableDiffusion, SD-XL and DeepFloyd-IF. Additionally, we made some minor adjustments to the DPM-Solver scheduler to support image inversion.

There are still some obvious TODOs left that we would appreciate some help with:

  • [x] Finalize IF implementation
  • [x] Adjust input checks to new, image editing setting (e.g. fail gracefully if inference is called before inversion)
  • [x] Add documentation
  • [x] Write unit tests

@patrickvonplaten, @apolinario, @linoytsaban you should all be able to commit to our fork. Would be great if you could help @kathath and I out a bit 😄

Who can review?

@patrickvonplaten

manuelbrack avatar Dec 06 '23 11:12 manuelbrack

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@kathath , @manuelbrack Yay! can we move it out of draft? 🔥😁

linoytsaban avatar Jan 04 '24 16:01 linoytsaban

maybe @sayakpaul can take a look? 😄

linoytsaban avatar Jan 12 '24 12:01 linoytsaban

Can we give this another review (cc @yiyixuxu and also maybe @DN6 )

patrickvonplaten avatar Feb 13 '24 15:02 patrickvonplaten

this looks very promising, few comments to polish it for production...

  • cannot make either sd or sdxl pipelines work with model cpu offloading
    as it doesnt pull text_encoder from cpu to gpu on invert()
  • LEditsPPPipelineStableDiffusionXL fails in invert().encode_image when running in fp16
    since VAE by default gets upcast causing float vs half runtime error
    workaround: pipe.vae.config.force_upcast = False
  • pipelines do not work with GPU torch.Generator()
    since tensor never gets moved to same device as generator created noise:
    xts[idx] = self.scheduler.add_noise(x0, noise, torch.Tensor([t]))
    
  • SDXL pipeline does not take width/height/clip_skip args like SD pipeline does
  • both pipelines do not set param pipe.num_timesteps which makes it hard to check progress in callbacks, etc.
    should be as simple as pipe.num_timesteps = len(pipe.inversion_steps)
  • guidance_rescale > 0 causes runtime error since noise_pred_edit_concepts is a tuple,
    not ndarray, so .mean() does not work
  • if source_guidance_scale is <=1, it causes runtime errors instead of disabling guidance
  • math for edit_cooldown_steps seems strange as it uses absolute values, not relative to timesteps
  • generally, not sure why explicit invert() is even needed,
    it can easily be combined in main pipeline __call__ method
    especially since its output is discarded and it sets values on pipeline directly
  • there is a note in docs https://huggingface.co/docs/diffusers/main/en/api/pipelines/ledits_pp that dpm++ 2m in diffusers is sub-optimal, but no info on why or example outputs?

vladmandic avatar Mar 13 '24 18:03 vladmandic

thanks for the feedback! @vladmandic

@manuelbrack can we address them in the refactor PR too?

yiyixuxu avatar Mar 13 '24 18:03 yiyixuxu

one more item:

  • pipeline fails in aggregate_attention on non-standard input image sizes, for example 872x512 or even simple 560x560

vladmandic avatar Mar 14 '24 13:03 vladmandic