InvokeAI use 🧨diffusers model

The goal is to reduce the amount of model code InvokeAI has to maintain by integrating https://github.com/huggingface/diffusers , using that to replace the existing ldm (descended from the original CompVis implementation).

I think the plan is that we keep the public APIs in ldm.invoke.generator stable while swapping out the implementations to be diffusers-based.

Discord discussion thread: https://discord.com/channels/1020123559063990373/1031668022294884392

[This is a continuation of #1384. The branch is now hosted in the InvokeAI repo instead of a fork for easier collaboration.]

Usage

Add a section to your models.yaml like this:

diffusers-1.5:
  description: Diffusers version of Stable Diffusion version 1.5
  format: diffusers
  repo_id: runwayml/stable-diffusion-v1-5

Note the format: diffusers.
The repo_id is as it appears on huggingface.co.

Sub-Tasks

i.e. things keturn would love to delegate.

[x] #1690
[x] #1997
[x] #1777
[x] #1778
[ ] #2160
[ ] #1779
[ ] #1876
[ ] #2042

To Do: txt2img

[x] don't load redundant models! (i.e. both the ckpt and the diffusers formats)
[x] allow scheduler selection
[x] support extra scheduler parameters (e.g. DDIM's eta).
[x] honor float16 setting for loading model
[x] update callback users with the new signature.
- at least done for invoke_ai_web_server. Not sure if the other instances are still in use?
[x] fix prompt fragment weighting. Refer to WeightedFrozenCLIPEmbedder.
[x] honor safety-checker setting
[x] allow loading a custom VAE
[x] make work with inpainting model
[ ] karras_max?

waiting on upstream diffusers

[x] fix seamless. (it looks like it's trying to work, but the padding is off somehow.) see https://github.com/huggingface/diffusers/issues/556
[ ] models.diffusion.cross_attention_control might be an obstacle, as that's not in stock diffusers yet and it meddles with some internals. The prompt-to-prompt authors do have a reference implementation that uses diffusers: https://nbviewer.org/github/google/prompt-to-prompt/blob/main/prompt-to-prompt_stable.ipynb

To Do: img2img

[ ] bug: crashes when strength is too low (zero timestamps. upstream bug in img2img pipeline?)
[ ] make sure we use the correct seeded noise
[x] make work with inpainting model
[x] decide if we need to keep --inpaint_replace now that --strength works. (Or should it apply to the results of the infill method?)
- decision: drop inpaint_replace

To Do: txt2img2img (high-res optimization)

[x] rewrite to our diffusers-based pipeline

To Do: inpainting

[x] work with the inpainting-specific model
[x] make sure masks are being used correctly
[ ] remove this kludge that has something to do with inpainting? https://github.com/keturn/InvokeAI/blob/f49317c4f5e4a29c747aaaa433e617ff8ae98f13/ldm/generate.py#L1004-L1006
- see https://github.com/invoke-ai/InvokeAI/pull/1583#issuecomment-1368339238

To Do: embiggen

[x] embiggen!
- (I took a quick look: it mostly goes through invoke's generate API instead of directly accessing the model or schedulers, so hopefully doesn't need to change much.)

Nov 27 '22 17:11 keturn

~~currently getting AttributeError: 'LatentDiffusion' object has no attribute 'image_from_embeddings' doing txt2img~~

fixed, fix was "read the instructions in the top of the PR about putting a format: diffusers model in your models.yaml"

Nov 27 '22 17:11 damian0815

looks like this needs torch==1.13 on macOS, otherwise it crashes in diffusers lib with an error about views having to be contiguous

Nov 27 '22 18:11 damian0815

something weird is going on because this is not a cat playing with a ball in the forest -s 10 -S 1699080397 -W 512 -H 512 -C 7.5 -A k_lms 000823 1699080397

k_euler produces the same composition

Nov 27 '22 18:11 damian0815

Seems to have worked on mps here with torch 1.12: https://github.com/invoke-ai/InvokeAI/actions/runs/3559296062/jobs/5978550236

click for banana sushi output

banana sushi

Does torch 1.13 on mac perform any better with this diffusers implementation? Or is it still much much slower than torch 1.12 with the old implementation?

Nov 27 '22 18:11 keturn

Seems to have worked on mps here with torch 1.12: https://github.com/invoke-ai/InvokeAI/actions/runs/3559296062/jobs/5978550236

that's actually x64, not m1/mps (check the install requirements step, calls x64 python) - github does not offer M1 hosts afaik.

Does torch 1.13 on mac perform any better with this diffusers implementation? Or is it still much much slower than torch 1.12 with the old implementation?

it seems slow but i haven't paid too much attention

Nov 27 '22 19:11 damian0815

something is definitely broke on mps because this is supposedly banana sushi -s 10 -S 42 -W 512 -H 512 -C 7.5 -A k_lms 000825 42

Nov 27 '22 19:11 damian0815

apricot sushi produces exactly the same image, as does empty string. ok, so that means that on MPS the prompt embeddings tensor is being zero'd (or inf'd?) somehow. i'll look into it later.

Nov 27 '22 19:11 damian0815

that's actually x64, not m1/mps (check the install requirements step, calls x64 python) - github does not offer M1 hosts afaik.

yeah, hmm, I think you're right. Well that makes it very misleading to have a check named mac-mps-cpu

Nov 27 '22 20:11 keturn

there is shared memory shenanigans going on in self.clip_embedder.encode (diffusers_pipeline.py line 309) on MPS that means that the second call overwrites the first-returned tensor. .clone() should fix it, testing now.

Nov 27 '22 20:11 damian0815

ok there's some deep in the weeds bug in pytorch, because:

conditioned_next_x - unconditioned_next_x

result = all zeros

conditioned_next_x.clone() - unconditioned_next_x

result = looks reasonable

i don't know why this is happening and i don't know what to do about it

Nov 27 '22 22:11 damian0815

Some change I just pulled in from development introduced a crash on model load, so I threw this in there as a stopgap: invoke-ai/InvokeAI@185aa24 (#1583)

If I'm reading things correctly, embedding_manager is currently a submodel defined by https://github.com/invoke-ai/InvokeAI/blob/8423be539b8968f82124360576a6b4c7957934ff/configs/stable-diffusion/v1-inference.yaml#L29-L30

The model configs like https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json have no such personalization_config.

Does it feel necessary to have that be data-driven, or is that class reference something we can hardcode?

Nov 29 '22 03:11 keturn

conditioned_next_x - unconditioned_next_x

result = all zeros

That's weird. Both on the same device and same dtype?

Nov 29 '22 03:11 keturn

development brought in a few more references to model.embedding_manager on the default code path, so I've patched over them in a few more places.

It was a kludge in one place before, but now it's spreading. Should probably the the next thing we tackle in this branch.

Nov 29 '22 22:11 keturn

I had it create an EmbeddingManager. I am not sure if it's working yet but at least it's back to not-crashing.

Also pushed a couple of fixes for deprecated diffusers things, cleaning up some of the warning messages it was spewing.

Nov 30 '22 05:11 keturn

realising i never answered your question about the personalization_config keturn - yes, i think it can be hardcoded

Nov 30 '22 19:11 damian0815

oof. re-resolving all the conflicts after the entirety of 2.2 was rebased was a doozy, but I think I did it okay.

Nov 30 '22 22:11 keturn

🚧 PLZ HOLD. DO NOT PUSH TO THIS BRANCH FOR A BIT, I WILL NEED TO FORCE-PUSH IT. 🚧

oh fiddlesticks.

this branch was based off of develop, so it contains that history.

but that history all got squashed away when it merged in to main.

that means I'm going to have to rebase this branch on main so it's not dragging in the duplicate history... okay.

Nov 30 '22 22:11 keturn

All clear.

As long as you're based of this current branch and not dragging in the old pre-2.2 development history, it's okay to push to here again.

Nov 30 '22 23:11 keturn

But just in case:

the branch based on old development is here: https://github.com/invoke-ai/InvokeAI/commit/6d6c03527c892ee6ed7a8c5232c1084ef2675d7e
one of the attempts at rebasing that on top of main is here: https://github.com/invoke-ai/InvokeAI/commit/89c6fbff495716d3be734243b4f8cffe4a8675e2
this branch is currently at https://github.com/invoke-ai/InvokeAI/commit/494936a8d24a09cb278a8e4520db3238b0e63125 which is made from 89c6fbff495716d3be734243b4f8cffe4a8675e2 but with some of the fiddly troubeshooting commits squashed out.

Nov 30 '22 23:11 keturn

I'd been thinking of doing "high res optimization" (txt2img2img) next, but after seeing how much the Unified Canvas relies on inpainting, and that it doesn't use txt2img2img, I'm working on invoke.generator.inpaint next.

This one certainly has a lot more code in it than img2img did!

Dec 01 '22 04:12 keturn

note to self for when self gets back to working on inpainting: you can distinguish the inpainting-specific model from the normal model by looking at unet.conv_in.in_channels. Normal models have 4, inpainting model has 9. (4 + 1-channel mask + 4-channel encoded masked image).

Dec 04 '22 09:12 keturn

I did some further work on inpaint, getting it working with the inpainting-specific model.

It's working well -- on the one test case I've thrown at it so far.

Still need to fix it so we can inpaint with the normal 4-channel model.

A few notes for next steps there:

I think we can combine the inpainting and img2img methods on the diffusers pipeline by treating img2img as inpainting with a flat mask.
This is maybe code that should be in InvokeAIDiffuserComponent? The use of the inpainting model is a modification to what gets passed to its model_forward_callback, and masking without the inpainting model is a change to the guidance function we apply to the model output.

Dec 05 '22 04:12 keturn

f3570d8 👀

Dec 05 '22 18:12 damian0815

Inpainting is working for me with both the normal model and runwayml inpainting model.

At least with DDIM, I haven't tested it broadly yet.

[It's also way too much code, but it's okay. We'll do a refactoring pass once we have a baseline with all the pieces working.]

Note I have not yet implemented inpaint_replace, because it seems to me it's redundant with strength. Do we need both?

Dec 05 '22 21:12 keturn

i do very much like the look of that last commit you made @keturn

Dec 07 '22 10:12 damian0815

Note I have not yet implemented inpaint_replace, because it seems to me it's redundant with strength. Do we need both?

@keturn - Inpaint replace is functionally different than strength. Can you clarify why it seems redundant?

Dec 07 '22 19:12 hipsterusername

Inpaint replace is functionally different than strength. Can you clarify why it seems redundant?

Can you clarify how they're functionally different? #inpainting code in developer-forums discord.

Dec 07 '22 19:12 keturn

Masked img2img (inpainting) is now working well with DDIM, and is functional with DPM Solver++, but is a horrible mess with K-LMS and Euler schedulers. There are a few open PRs upstream related to img2img or inpainting; I'm thinking I can pause inpainting troubleshooting here until those are resolved and then we can sync up with the upstream implementation again.

https://github.com/huggingface/diffusers/pull/1583 (omg that's the same number as this PR! Twins!)
https://github.com/huggingface/diffusers/pull/1585

In other news, I opened an issue about model management. Not a showstopper at this point but something to keep an eye on.

https://github.com/huggingface/huggingface_hub/issues/1259

Dec 12 '22 03:12 keturn

@keturn what else needs doing on this? i know there's the embeddings stuff that i'm hoping to get to soon, but is there anything else (other than those issue requests you've opened already)?

Dec 12 '22 13:12 damian0815

@damian0815 I did have some trouble merging in the attention maps code from the main branch. Do we know how to hook that callback up here? cd358c4 (#1583)

Dec 12 '22 19:12 keturn