InvokeAI icon indicating copy to clipboard operation
InvokeAI copied to clipboard

[WIP] Generation backend rewrite

Open StAlKeR7779 opened this issue 1 year ago • 4 comments

Summary

Rewriting of generation backend. At this moment backend rewritten to new architecture, but from user perspective nothing should change as backend converts arguments silently. Later in major update we can apply changes further. Recreated after conversation with @dunkeroni, from #6548 and his ideas.

What here done: Simplified generation logic to most basic steps and added injection points(modifiers/overrides) to which users can attach and change generation logic. Logic moved from backend to extensions:

  • Generation latents preview
  • Rescale CFG
  • Handling inpainting/inpaint models
  • Guidance models - ControlNet, T2I Adapter, IP Adapter

Related Issues / Discussions

None

QA Instructions

Currently old generation code not removed, so you can patch if in latents_from_embeddings and run old logic to check/compare.

Merge Plan

Add tiled generation support. Remove old backend code, which currently remains for testing.

Checklist

  • [x] The PR has a short but descriptive title, suitable for a changelog
  • [ ] Tests added / updated (if applicable)
  • [ ] Documentation added / updated (if applicable)

@dunkeroni @RyanJDick

StAlKeR7779 avatar Jul 03 '24 23:07 StAlKeR7779

I took an initial read through this today. Before I leave any comments on the code, I think some more design discussion is warranted. I'll start the conversation here - but a new feature proposal ticket might be a better place for this.

First, let's clarify the goals with this PR. To me:

  1. Separation of code by feature for improved readability.
  2. Make it easier to add new features.
  3. Provide an interface for community node contributors to import and extend the diffusion process.

Is this list accurate? How important is 3.? We'd go about his differently if we drop 3 as a goal.

Next, here are my top concerns with the current direction:

  • It is hard to anticipate how new features will want to modify the diffusion process. I fear that we will have to keep adding new injection points to support future features. If this happens, the abstraction of injection points will become a hindrance rather than a benefit.
  • Extensions may interfere with one another. It may become more difficult to handle interactions between extensions.
  • If we are aiming for Goal 3. then we need a clearer proposal for how this is going to work. Currently, diffusers_pipeline.py is not part of the public API, and it probably shouldn't be.

Let me know how you are thinking about these things. I'd like to get alignment in some form of design document before we get into the code on this one.

I'm not trying to hold up this effort. I just want to make sure that we've considered it thoroughly. If we're not careful, I think there is a real risk that we add a bunch of complexity and don't achieve the positive outcomes that we are aiming for.

RyanJDick avatar Jul 04 '24 22:07 RyanJDick

I would argue that 3 is vital, and not significantly more effort than 2. Invoke's position is already to not provide official support for everything under the sun, and requiring that node authors either make a clone of diffusers_pipeline.py or go use ComfyUI instead is a non-starter. Those are the only other options for the sort of features that this enables. 3 is also unavoidable if we want to make new features easier to add for official functions. There will need to be some generalized extension input on the node if we want to have support for ControlNet, and Controlllite, and ICLight, and the rest. Otherwise get ready for a Denoise Latents node with three dozen input connectors a few years from now.

This also isn't a new or uncharted idea. This PR is a (more technically advanced) version of the Modular Denoise Latents nodes that I have been working on and experimenting with for 8 months now. Those nodes were created because it was becoming infuriatingly complex to implement new research papers in Invoke while comparing or combining their effects. The architecture has been reworked a few times already to facilitate my many (often failed) experiments. As an example, our current compositing implementation in Canvas exists because of tests in that node pack.

As for interactions between extensions; we do not need to guarantee that all extensions are cross compatible. We already don't guarantee that all nodes work with each other aside from type restrictions for what gets handed between them. The only hard restriction is that extensions which override and change the structure of the pipeline (e.g. tiled generation) cannot run with other extensions that attempt to override the same point. Most extensions only modify the data between process steps, and very few of them would actually break any others.

dunkeroni avatar Jul 05 '24 22:07 dunkeroni

From my perspective 3 is not main goal in this PR, I mostly thinks about simplification and further edit. But still I agree with @dunkeroni about sharing this api to users, but I suggest it do in this steps:

  1. In this PR we assume that it's not shared, as we not provide for users extensions input on node
  2. When we do this big change and provide extensions input, we can say that api still unstable/beta and still can be changed
  3. Later, after looking at more extensions(ic-light, cnet++, ...) we can say with more confidence if this still unstable api

While I writing it I saw only 2 moment in api which problematic/can change:

  1. What if we want patch unet on cpu? For example to trade speed for memory
  2. What if extension changes attention process, like IP-Adapter? We can't abstract attention processor with confidence as have only one such extension

Other than this I feel that API should be already stable and in further injection points will be only added, not changed.

StAlKeR7779 avatar Jul 08 '24 12:07 StAlKeR7779

I'm excited about how this is progressing 🚀

I took a stab at working backwards from this draft PR to a design document so that we can align on some of the important decisions being made here: https://www.notion.so/invokeai/Modular-Stable-Diffusion-Backend-Design-Document-e8952daab5d5472faecdc4a72d377b0d

DM me your email on Discord (ryanjdick) and I'll share the document with you.

@StAlKeR7779 @dunkeroni I'm hoping that you two can review that doc and help with filling in some of the incomplete/unresolved sections.

We have to treat this PR with some extra care, given its scale and scope, but I think the effort will be worth it.

RyanJDick avatar Jul 08 '24 21:07 RyanJDick