stable-diffusion-webui
stable-diffusion-webui copied to clipboard
Support for runwayml In-painting SD model.
A simple addition to support the new in-painting model released here: https://github.com/runwayml/stable-diffusion
We update the stable-diffusion dependency to point to the new repo and pass in the required additional features to the model. It requires an extra masked-image and mask inputs which act as visual conditioning for the model. Setting the mask to be all 1s can also be used for txt2img generation.
Implemented
- K-Diffusion txt2img
- K-Diffusion img2img
- K-Diffusion inpaint
TODO
VanillaStableDiffusionSamplerupdates- Add a flag to detect if we need to create the masked tensors to save some memory.
- Fix
use_ema: Falseconfig option. Currently need to adduse_ema: Falseinsd-v1-5-inpainting.yaml, otherwise the checkpoint will not load.
Have you tested the vanilla 1.4 model with this PR?
If the config .yaml needs to be changed, you can ship a config and use shared.cmd_opts.config to use that new config when loading the Runway model.
what is that extra masked-image?
Have you tested the vanilla 1.4 model with this PR?
Yes, I observe matching seed parity with the CompVis stable-diffusion repo. The only code path that the visual conditing is used in is the new hybrid conditioning, so it shouldn't effect any crossattn models. Although it might be worth it to only create the masks when they are actually needed.
https://github.com/runwayml/stable-diffusion/blob/main/ldm/models/diffusion/ddpm.py#L1431
If the config .yaml needs to be changed, you can ship a config and use
shared.cmd_opts.configto use that new config when loading the Runway model.
Ideally the config should not need to be changed. I Originally misattributed the bug. LatentInpaintDiffusion in the yaml is fine, but the original sd-v1-5-inpainting.yaml is missing use_ema: False. This causes the checkpoint to be loaded incorrectly, effectively not loading the checkpoint at all.
what is that extra masked-image?
It provides the network with contextual information about the original image. Presumably this allows it to better fine-tune the in-painting, creating a more coherent image.
@random-thoughtss You can do sd_config.model.params.use_ema = False in sd_models.py after OmegaConf.load
I'm in randomthoughtss branch, monkeypatched sd_config.model.params.use_ema = False > sd_models.py and 1.4 loads now, size mismatch persists for "1.5" inpaint.
caveat; torch1.12.1+rocm5.1, bur it usually doesn't matter.
File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/gradio/routes.py", line 275, in run_predict
output = await app.blocks.process_api(
File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/gradio/blocks.py", line 787, in process_api
result = await self.call_function(fn_index, inputs, iterator)
File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/gradio/blocks.py", line 694, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/cornpop/ml/stable-diffusion-webui/modules/ui.py", line 1633, in <lambda>
fn=lambda value, k=k: run_settings_single(value, key=k),
File "/home/cornpop/ml/stable-diffusion-webui/modules/ui.py", line 1488, in run_settings_single
opts.data_labels[key].onchange()
File "/home/cornpop/ml/stable-diffusion-webui/webui.py", line 40, in f
res = func(*args, **kwargs)
File "/home/cornpop/ml/stable-diffusion-webui/webui.py", line 85, in <lambda>
shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: modules.sd_models.reload_model_weights(shared.sd_model)))
File "/home/cornpop/ml/stable-diffusion-webui/modules/sd_models.py", line 252, in reload_model_weights
load_model_weights(sd_model, checkpoint_info)
File "/home/cornpop/ml/stable-diffusion-webui/modules/sd_models.py", line 169, in load_model_weights
missing, extra = model.load_state_dict(sd, strict=False)
File "/home/cornpop/conda/envs/shit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
size mismatch for model.diffusion_model.input_blocks.0.0.weight: copying a param with shape torch.Size([320, 9, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).
That’s most likely due to our repo using the CompVis config. Try also adding:
sd_config.model.params.conditioning_key = hybrid
I think this model could also be used for outpainting with great effect.
sd_config = OmegaConf.load(checkpoint_info.config)
###monkey
sd_config.model.params.use_ema = False
sd_config.model.params.conditioning_key = hybrid
###
sd_model = instantiate_from_config(sd_config.model)
Vanilla python webui.py
Traceback (most recent call last): File "/home/cornpop/ml/stable-diffusion-webui/webui.py", line 161, in <module> webui(cmd_opts.api) File "/home/cornpop/ml/stable-diffusion-webui/webui.py", line 122, in webui initialize() File "/home/cornpop/ml/stable-diffusion-webui/webui.py", line 84, in initialize shared.sd_model = modules.sd_models.load_model() File "/home/cornpop/ml/stable-diffusion-webui/modules/sd_models.py", line 215, in load_model sd_config.model.params.conditioning_key = hybrid NameError: name 'hybrid' is not defined
Change hybrid to "hybrid"
size mismatch for model.diffusion_model.input_blocks.0.0.weight: copying a param with shape torch.Size([320, 9, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).
I give up for now. Non-programmer trashing up the collabo isn't going to do any good.
Actually, that shouldn't happen. @random-thoughtss When you tested 1.4, did you change the model dimensions to match 1.4 inside the config?
We shouldn't break compatibility with 1.4, as 1.5 (which will release very soon now) uses the same dimensions.
@AUTOMATIC1111 Curious to hear your thoughts on this model.
My thinking is like this: Load the normal model at all times (whether that's vanilla 1.4, 1.5, WD or whatever) Add a checkbox to outpainting & inpainting If the user checks this checkbox, load the RunwayML model, run inference, unload (maybe dependent on a user setting).
sd_config.model.target = "ldm.models.diffusion.ddpm.LatentInpaintDiffusion"
sd_config.model.params.use_ema = False
sd_config.model.params.conditioning_key = "hybrid"
sd_config.model.params.unet_config.params.in_channels = 9
This is all that's needed to load it as-is. I've had better results outpainting with this model than inpainting but probably a skill issue. (hilariously poor man's outpainting seems to work better than mk2 with this model)
We also don't need to switch to the RunwayML repo for this. We can continue our proud tradition of hijacking the CompVis repo. I wrote some working code performing just that.
oxy: switching to different repo is a big step, I need to grab his branch and check if it really is a lot better, then there can be some considerations.
also is the sd 1.5 the finetuned 1.5 model that emad keeps from being released?
We don’t need to switch repos. I wrote working hijacking code for this.
1.5 is (much like 1.4) just 1.2 but further along training.
1.4 is resumed from 1.2 and trained for ~270k steps I think, and 1.5 ~600k
that emad keeps from being released?
yes @AUTOMATIC1111
+modules/sd_hijack_loading.py
import math
import os
import sys
import traceback
import torch
import numpy as np
from einops import rearrange
from omegaconf import ListConfig
from modules import shared
import ldm.models.diffusion.ddpm
from ldm.models.diffusion.ddpm import LatentDiffusion
@torch.no_grad()
def get_unconditional_conditioning(self, batch_size, null_label=None):
if null_label is not None:
xc = null_label
if isinstance(xc, ListConfig):
xc = list(xc)
if isinstance(xc, dict) or isinstance(xc, list):
c = self.get_learned_conditioning(xc)
else:
if hasattr(xc, "to"):
xc = xc.to(self.device)
c = self.get_learned_conditioning(xc)
else:
# todo: get null label from cond_stage_model
raise NotImplementedError()
c = repeat(c, "1 ... -> b ...", b=batch_size).to(self.device)
return c
class LatentInpaintDiffusion(LatentDiffusion):
def __init__(
self,
concat_keys=("mask", "masked_image"),
masked_image_key="masked_image",
*args,
**kwargs,
):
super().__init__(*args, **kwargs)
self.masked_image_key = masked_image_key
assert self.masked_image_key in concat_keys
self.concat_keys = concat_keys
@torch.no_grad()
def get_input(
self, batch, k, cond_key=None, bs=None, return_first_stage_outputs=False
):
# note: restricted to non-trainable encoders currently
assert (
not self.cond_stage_trainable
), "trainable cond stages not yet supported for inpainting"
z, c, x, xrec, xc = super().get_input(
batch,
self.first_stage_key,
return_first_stage_outputs=True,
force_c_encode=True,
return_original_cond=True,
bs=bs,
)
assert exists(self.concat_keys)
c_cat = list()
for ck in self.concat_keys:
cc = (
rearrange(batch[ck], "b h w c -> b c h w")
.to(memory_format=torch.contiguous_format)
.float()
)
if bs is not None:
cc = cc[:bs]
cc = cc.to(self.device)
bchw = z.shape
if ck != self.masked_image_key:
cc = torch.nn.functional.interpolate(cc, size=bchw[-2:])
else:
cc = self.get_first_stage_encoding(self.encode_first_stage(cc))
c_cat.append(cc)
c_cat = torch.cat(c_cat, dim=1)
all_conds = {"c_concat": [c_cat], "c_crossattn": [c]}
if return_first_stage_outputs:
return z, all_conds, x, xrec, xc
return z, all_conds
def do_hijack():
ldm.models.diffusion.ddpm.get_unconditional_conditioning = get_unconditional_conditioning
ldm.models.diffusion.ddpm.LatentInpaintDiffusion = LatentInpaintDiffusion
sd_models.py
from modules.sd_hijack_loading import do_hijack
in load_model
if str(checkpoint_info.filename).endswith("inpainting.ckpt"):
do_hijack()
sd_config.model.target = "ldm.models.diffusion.ddpm.LatentInpaintDiffusion"
sd_config.model.params.use_ema = False
sd_config.model.params.conditioning_key = "hybrid"
sd_config.model.params.unet_config.params.in_channels = 9
Since you researched it, do you mind writing a paragraph or so about what it does differenty, apart from using a new model?
I haven't researched this model very long. As far as I can see, it adds 5(1+4) new input channels for inpainting and finetunes for that.
Personally, I think it's a big improvement for outpainting, at least.
Oh, do you mean the code? Not much, the star of the show is the model. The code is almost entirely enablement code.
Here's a outpainting result (poor man's outpainting, 100 steps)

It can even outpaint twice without breaking down, something I've never been able to do with raw SD.

I should have probably mentioned that the original config for the in-painting model was not released alongside the checkpoint but can be found here. https://raw.githubusercontent.com/runwayml/stable-diffusion/main/configs/stable-diffusion/v1-inpainting-inference.yaml
This config works with the current repo, with the additional use_ema: False.
sd_config.model.target = "ldm.models.diffusion.ddpm.LatentInpaintDiffusion" sd_config.model.params.use_ema = False sd_config.model.params.conditioning_key = "hybrid" sd_config.model.params.unet_config.params.in_channels = 9
These manual changes by @C43H66N12O12S2 replicate all of the changes RunwayML made do their config. Would it be better to
- hard-code these changes in the monkey patch?
- Provide instructions on how to change the RunwayML config?
- Force just
use_emaand let the user figure out the config?
Just a sidenote, reload_model_weights needs to be modified as well, or switching won't work if the initial model is a "normal" model. The easiest - if not elegant - way to achieve that would be if sd_model.sd_checkpoint_info.config != checkpoint_info.config or checkpoint_info.filename.endswith("inpainting.ckpt"):
Actually, the reverse will fail as well (switching from runway to any other model with 4 channels)
Also, we should add credit to the RunwayML repo in sd_hijack_loading.py
Aside from those minor adjustments, this PR is close to ready. Just need to support vanilla samplers.
Seems to not work with txt2img hires fix, but that’s not the usecase for this model anyways.
Hmm... if I checkout
c6f4a873d7c8a916814e3201044b84b72e09769a
and save https://raw.githubusercontent.com/runwayml/stable-diffusion/main/configs/stable-diffusion/v1-inpainting-inference.yaml (with additional use_ema:false parameter)
as {models}/sd-v1-5-inpainting.yaml
I get the error
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [320, 9, 3, 3], expected input[2, 4, 64, 64] to have 9 channels, but got 4 channels instead
Were there other changes needed to get this working?
Why was this closed? Is there another version in the works?
bump, we need this outpainting quality, it's crazy good

Why was this closed? Is there another version in the works?
Because the merge was totally botched. This needs a deep cleanup.
Follow https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/3192 for the proper PR.
Yup, this repo got messed up. The new PR continues the work.
@AUTOMATIC1111 Github support says they can remove the dead commits from the pr and keep the discussion if you permit it.