stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

Add support for RunwayML In-painting Model

Open random-thoughtss opened this issue 3 years ago • 22 comments

Apologies about the mess, I wrecked my repo trying to merge changes from upstream. It was easier to just make a new one.

This commit adds support for the v1.5 in-painting model located here: https://github.com/runwayml/stable-diffusion

We can use this model in all modes, txt2img and img2img, by just selecting the mask for each mode.

Working

  1. K-Diffusion img2img, txt2img, inpainting
  2. DDIM txt2img, img2img, inpainting
  3. Switching models between in-painting and regular v1.4

TODO

  1. Test all sorts of variations, image sizes, etc.

random-thoughtss avatar Oct 19 '22 21:10 random-thoughtss

@random-thoughtss For creating masks when necessary - if my understanding that all these masks are necessary, at all modes and times, for this model is correct - wouldn't a simple check against the checkpoint filename suffice?

Amazing work, by the way.

C43H66N12O12S2 avatar Oct 19 '22 21:10 C43H66N12O12S2

@C43H66N12O12S2 I decided to just check for the attention method. Sadly it looks like a lot of the stable-diffusion code assumes that there is something in the cond so setting it to None will require a couple of changes. For now I'll just initialize a zero latent tensor, they generally don't take up that much space and it skips the image encoder call.

random-thoughtss avatar Oct 19 '22 22:10 random-thoughtss

The miniscule amount of VRAM zero tensors consume is no big deal. Thanks for adding support for hires fix. IMO this pr is ready for merge. @AUTOMATIC1111

For anybody interested, this model does exceptionally well with NAI models. 111 00150-1242780190-a woman in a black outfit with a red cape on her shoulders and a red necklace on her neck, by theCHAMBA

C43H66N12O12S2 avatar Oct 19 '22 22:10 C43H66N12O12S2

Thank you for the PR. One thing I noticed: setting Inpaint at full resolution to True breaks inpainting.

terminoldman avatar Oct 20 '22 03:10 terminoldman

Thank you for the PR. One thing I noticed: setting Inpaint at full resolution to True breaks inpainting.

@genesst I could not reproduce on my end.

What is the error? What model and sampler are you using?

random-thoughtss avatar Oct 20 '22 03:10 random-thoughtss

how could you add support for other models? I suspect just merging the models might no be enough?

kantsche avatar Oct 20 '22 03:10 kantsche

PLZ AUTOMATIC1111 CHECK THIS ONE OUT ?

GhostDragon69 avatar Oct 20 '22 07:10 GhostDragon69

Can we use this before committed to main branch?

aniketgore avatar Oct 20 '22 10:10 aniketgore

To use this do you switch models like any other model before ?

chekaaa avatar Oct 20 '22 13:10 chekaaa

The miniscule amount of VRAM zero tensors consume is no big deal. Thanks for adding support for hires fix. IMO this pr is ready for merge. @AUTOMATIC1111

How much VRAM are we talking about here? Some people have very slim margins between their maximum possible rendering resolution and OOM errors.

ProGamerGov avatar Oct 20 '22 14:10 ProGamerGov

Tested it. It works amazingly well. For those who are wondering how to use it

  • first switch to this pull request in git.
  • download runwayml model and place it in the same place you put SD model file.
  • select this inpainting model from dropdown at top left corner of the page in web interface
  • enjoy inpainting

SMUsamaShah avatar Oct 20 '22 16:10 SMUsamaShah

@ProGamerGov It would take up a little under 81KB for the standard sized image at fp32. However, looking at it, it shouldn't care what the size of the dummy latent image is, just that the batch size is correct. It should be enough to make a dummy 1x1 image, meaning it'll only take up an extra 20 bytes per image at fp32.

random-thoughtss avatar Oct 20 '22 16:10 random-thoughtss

@ProGamerGov It would take up a little under 81KB for the standard sized image at fp32. However, looking at it, it shouldn't care what the size of the dummy latent image is, just that the batch size is correct. It should be enough to make a dummy 1x1 image, meaning it'll only take up an extra 20 bytes per image at fp32.

That sounds fine then!

ProGamerGov avatar Oct 20 '22 16:10 ProGamerGov

how could you add support for other models? I suspect just merging the models might no be enough?

@ryukra The weights have a completely different first layer, so you can't just merge models together. I guess you could try just dropping the extra 5 inputs from the input layer's weights, no idea how the network would react.

It looks like they have some training code using diffusers available here: https://huggingface.co/runwayml/stable-diffusion-inpainting but the "generate synthetic masks and in 25% mask everything" pipeline is not implemented anywhere.

random-thoughtss avatar Oct 20 '22 16:10 random-thoughtss

I've downloaded and selected the runwayml model, but I'm still getting bad results. Could someone please explain their process for getting good in/out painting?

  • Are you using 1 of the custom scripts?
  • Are you manually modifying your input images to have transparency / are you creating custom masks?
  • What dimensions & generation settings are you using?

A short video would be super helpful!

ArrowM avatar Oct 20 '22 16:10 ArrowM

this PR breaks all other models in DDIM mode

  File "stable-diffusion-webui/modules/sd_samplers.py", line 240, in <lambda>
    samples_ddim = self.launch_sampling(steps, lambda: self.sampler.sample(S=steps+1, conditioning=conditioning, batch_size=int(x.shape[0]), shape=x[0].shape, verbose=False, unconditional_guidance_scale=p.cfg_scale, unconditional_conditioning=unconditional_conditioning, x_T=x, eta=self.eta)[0])
  File "miniconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "stable-diffusion-webui/repositories/stable-diffusion/ldm/models/diffusion/ddim.py", line 84, in sample
    cbs = conditioning[list(conditioning.keys())[0]].shape[0]
AttributeError: 'list' object has no attribute 'shape'

remixer-dec avatar Oct 20 '22 19:10 remixer-dec

@remixer-dec Took some time to replicate, but it turns out if you never load the in-painting model, the the DDIM methods never get replaced. Made sure to always replace them. Thanks for spotting that!

Should work now even when never loading an in-painting model.

random-thoughtss avatar Oct 20 '22 20:10 random-thoughtss

I've noticed whilst trying to produce comparisons that the inpainting model does not work when using the X/Y Plot.

If you specify the inpainting model within an X/Y plot under the "Checkpoint" section, when it gets to that model it does not load it and instead uses the previously used model again. Even the console seems to get confused.

This is an image I inpainted, it was to add an extra tower on the far right of the image: tmppn1h6p33

You can see here that the output from the 1.5 model and the inpainting model are identical, however if you look at the console output.

X/Y plot will create 6 images on a 1x3 grid. (Total steps to process: 240) 100%|██████████████████████████████████████████████████████████████████████████████████| 31/31 [00:05<00:00, 6.01it/s] 100%|██████████████████████████████████████████████████████████████████████████████████| 31/31 [00:05<00:00, 5.78it/s] Loading weights [a9263745] from D:\Code\Stable-Diffusion\AUTOMATIC1111\stable-diffusion-webui\models\Stable-diffusion\v1-5-pruned.ckpt Global Step: 840000 Applying cross attention optimization (Doggettx). Weights loaded. 100%|██████████████████████████████████████████████████████████████████████████████████| 31/31 [00:05<00:00, 5.86it/s] 100%|██████████████████████████████████████████████████████████████████████████████████| 31/31 [00:05<00:00, 5.58it/s] LatentDiffusion: Running in eps-prediction mode███████ | 124/240 [00:41<00:19, 5.82it/s] DiffusionWrapper has 859.52 M params. making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels Loading weights [7460a6fa] from D:\Code\Stable-Diffusion\AUTOMATIC1111\stable-diffusion-webui\models\Stable-diffusion\SD-v1-4.ckpt Global Step: 470000 Applying cross attention optimization (Doggettx). Model loaded. 100%|██████████████████████████████████████████████████████████████████████████████████| 31/31 [00:10<00:00, 2.93it/s] 100%|██████████████████████████████████████████████████████████████████████████████████| 31/31 [00:07<00:00, 4.21it/s] Loading weights [7460a6fa] from D:\Code\Stable-Diffusion\AUTOMATIC1111\stable-diffusion-webui\models\Stable-diffusion\SD-v1-4.ckpt Global Step: 470000 Applying cross attention optimization (Doggettx). Weights loaded.

You can see that it starts the plot, the SD-v1-4 model is already loaded so it does not load anything new, it does 2 images and then loads the v1-5-pruned model. After this it should be selecting the inpainting model, but it goes back to showing the SD-v-1-4 model.

But when compared to the image, it obviously doesn't match up, as the v1-5-pruned images and the sd-v1-5-inpainting models are identical, and the SD-v1-4 are completely different.

Also, if you have the Inpainting model already loaded and then try to do an X/Y Plot with other models, it just ignores those models and only uses the Inpainting model.

I believe it may have something to do with it detecting the inpainting model and using "LatentInpaintDiffusion" instead of the usual "LatentDiffusion"

Example of what happens when using the X/Y Plot whilst the inpainting model is your currently selected model. tmpz2bfa4ng

Outputs are identical because it's only using the inpainting model, but you can see from the screenshot that it thinks it's using the other models.

X/Y plot will create 2 images on a 2x1 grid. (Total steps to process: 40) LatentInpaintDiffusion: Running in eps-prediction mode DiffusionWrapper has 859.54 M params. making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels Loading weights [3e16efc8] from D:\Code\Stable-Diffusion\AUTOMATIC1111\stable-diffusion-webui\models\Stable-diffusion\sd-v1-5-inpainting.ckpt Global Step: 440000 Applying cross attention optimization (Doggettx). Model loaded. 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 9.49it/s] LatentInpaintDiffusion: Running in eps-prediction mode | 19/40 [00:27<00:05, 3.71it/s] DiffusionWrapper has 859.54 M params. making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels Loading weights [3e16efc8] from D:\Code\Stable-Diffusion\AUTOMATIC1111\stable-diffusion-webui\models\Stable-diffusion\sd-v1-5-inpainting.ckpt Global Step: 440000 Applying cross attention optimization (Doggettx). Model loaded. 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 9.58it/s] Total progress: 100%|██████████████████████████████████████████████████████████████████| 40/40 [00:50<00:00, 1.25s/it]

Arron17 avatar Oct 20 '22 20:10 Arron17

There's a slight issue, not sure if it's just me. While doing inpainting with mask, unmasked area is always processed, with 1 sampling step, what it seems. It's obvious on previews and on resulting image. Funny thing is, color corrected image is correct with those enabled. image UPD: whole image changes slightly with each iteration, not once at the start, as i first thought. experiment: image image

mezotaken avatar Oct 20 '22 22:10 mezotaken

There's a slight issue, not sure if it's just me. While doing inpainting with mask, unmasked area is always processed, with 1 sampling step, what it seems. It's obvious on previews and on resulting image. Funny thing is, color corrected image is correct with those enabled. image UPD: whole image changes slightly with each iteration, not once at the start, as i first thought. experiment: image image

This looks like a very old quirk that was fixed at some point, it was modifying a bit of the rest of the image to blend it better, from what I gather, color correction is just applying processing to the image, so maybe OP's PR is somehow bypassing some crucial fonction

Mozoloa avatar Oct 20 '22 22:10 Mozoloa

I've noticed whilst trying to produce comparisons that the inpainting model does not work when using the X/Y Plot.

@Arron17 This ended up being a more general bug with xy grid, but the fix is really small. Essentially when loading a different set of weights but the same config, it would update the weights in-place and the sampler would see the new model. When the config changes, the entire model needs to be remade, and the sampler's model wasn't updated. I guess the issue has just never come up before since most models use the default config.

@AUTOMATIC1111 Let me know if this change doesn't belong in this PR.

random-thoughtss avatar Oct 20 '22 23:10 random-thoughtss

@AUTOMATIC1111 It looks like RunwayML's 1.5 checkpoint is actually based on the real 1.5 model from Stability AI. So, this isn't just some random model.

Source: https://www.reddit.com/r/StableDiffusion/comments/y91pp7/stable_diffusion_v15/

ProGamerGov avatar Oct 20 '22 23:10 ProGamerGov

Were RunwayML the original authors of SD? If so we could very well switch to their repo and get rid of the hijacking code. I initially thought it was just some dude who trained the inpainting model.

Anyway, great work, I did some testing and it all seems to work fine.

AUTOMATIC1111 avatar Oct 21 '22 06:10 AUTOMATIC1111

CompVis is the original author. They're a research group who were staffed by student research (IIRC) and some RunwayML employees.

C43H66N12O12S2 avatar Oct 21 '22 06:10 C43H66N12O12S2

Masked Inpainting is broken right now on every model. Non-masked parts of the image are being changed slightly.

mezotaken avatar Oct 21 '22 07:10 mezotaken

Its working fine for me image image image

chekaaa avatar Oct 21 '22 07:10 chekaaa

That is so weird. Are you using color correction?

mezotaken avatar Oct 21 '22 07:10 mezotaken

Do you have the color correction setting selected by any chance?

chekaaa avatar Oct 21 '22 07:10 chekaaa

i'm using it, and it saves both images, look here

wtf So for me the color-corrected image is identical in non-masked area, but the one without color correction is slightly distorted.

mezotaken avatar Oct 21 '22 07:10 mezotaken

Does it happens with images that don't have a complete white bg?

chekaaa avatar Oct 21 '22 07:10 chekaaa