diffusers Inpainting produces results that are uneven with input image

Describe the bug

SD inpainting works fine only if mask is absolutely perfect.
Otherwise, there are always visible seams at the edge of the mask, and uneven colors between inpainted and input image.

I've tried manually assembling latents and using them for image and mask_image instead of images as well as manually assembling entire masked_image_latent - results are the same so i left reproduction as simple as possible.

Same behavior is visible in SD and SD-XL pipelines, using base model as well as dedicated inpainting models.
Non-diffuser inpainting implementations such as legacy A1111 implementation does not have this issue.

i've attached a very simple reproduction code that

generates image
creates mask as a square in the middle of the image.
runs inpainting pipeline

Reproduction

import torch
import diffusers
from PIL import Image

model = '/mnt/f/Models/stable-diffusion/cyan-2.5d-v1.safetensors'

pipe0 = diffusers.StableDiffusionPipeline.from_single_file(model).to('cuda')
print(pipe0.__class__.__name__)
base = pipe0('seashore').images[0]
print(base)

mask = Image.new('L', base.size, 0)
square = Image.new('L', (base.size[0]//2, base.size[1]//2), 255)
mask.paste(square, (base.size[0]//4, base.size[1]//4))
print(mask)

pipe1 = diffusers.AutoPipelineForInpainting.from_pipe(pipe0).to('cuda')
print(pipe1.__class__.__name__)
inpaint = pipe1('house', image=base, mask_image=mask, strength=0.75).images[0]
print(inpaint)

base.save('base.png')
mask.save('mask.png')
inpaint.save('inpaint.png')

Logs

No response

System Info

diffusers==0.23.0

Who can help?

@patrickvonplaten @yiyixuxu @DN6 @sayakpaul

Examples

base

mask

inpaint

note: issue was originally reported at https://github.com/vladmandic/automatic/issues/2501 which you can check for additional examples.

Nov 15 '23 16:11 vladmandic

Thanks for the clean issue here! @yiyixuxu can you have a look?

Nov 20 '23 11:11 patrickvonplaten

hi @vladmandic:

I'm trying to compare with auto1111 but I'm seeing same issue - can you tell me if there is anything wrong with my setting?

Dec 01 '23 23:12 yiyixuxu

Played around with it a little bit more. I think the "mask blur" option helps with this issue. I will look into adding this in diffusers.

it is still not perfect though, let me know if there is anything else that I missed, I'm pretty new to auto1111 so it will help a lot if you can point me to the correct settings

mask blur = 0 mask_blur0

mask blur = 32 mask_blur32

Dec 01 '23 23:12 yiyixuxu

I think mask blur is really good at "hiding" the issue with i paint, it would be a welcome addition to diffusers. Underlying problem still exists, but really unsure how else to address it.

Dec 01 '23 23:12 vladmandic

@vladmandic

ok, I will add the mask blur! Agree that it does not seem to resolve the issue completely - but it seems like the underlying issue exists in both diffusers and auto1111, no? just want to make sure so that I don't waste more time digging into auto1111's code base

Dec 02 '23 00:12 yiyixuxu

I'll dig into it more, you can focus on mask blur. If I find something else I'll update here.

Dec 02 '23 00:12 vladmandic

I have had the time to test this out further and it looks like it's indeed very similar between Diffusers and the original backend - but not the same. The best way to test this is an image with contrast, and the mask covering more than one object, like background/foreground/clothing and such. Tests were done using @vladmandic's UI. My test image: inpaintx1 The mask (no blur applied): inpaintx1mask Results with the original backend (assuming it's the same as auto1111): originalbackend Results with the diffusers backend: diffusersbackend In both, the hue of the shirt is changed slightly, but I think it's (maybe subjectively) worse in diffusers. However, if we zoom in at the shoulder: zoom_original zoom_diffusers In the original, the fuzzy background color is almost untouched. In the diffusers example, the fuzzy background color is getting almost the same treatment as the shirt, getting desaturated in what looks like an identical amount to the shirt. I think a bit of a color shift is expected, maybe because of latent encoding, maybe because the model only "knows" certain colors or shades. So you would get different results with different colors, objects, and so on. But diffusers adds a plain discoloration to the image.

I also saw that the preview of the generation process hints at a difference: Original backend only adds noise to the masked area, and denoises it. Diffusers seems to add noise to the entire image, and then reconstructs it somehow? Maybe during that reconstruction, color information gets lost. One thing to consider: When using auto1111 or derivative UIs, inpainting always pastes the generated image on top of the original with the mask applied. My theory is that the discoloration actually affects the entire generated image, and is only seen as a "border" in the result because of the post processing. Maybe @vladmandic could add a setting to his UI that outputs the unprocessed result, for debug purposes?

Dec 07 '23 12:12 23pennies

inpainting always pastes the generated image on top of the original with the mask applied.

there is no such thing, all "magic" happens in preprocessing. the difference in live preview is likely due to "mask only" vs "full image".

Dec 07 '23 14:12 vladmandic

If you're referring to "Inpaint area", I always use the "Whole picture" option.

Dec 07 '23 15:12 23pennies

Using the TAESD live preview method, I can see no visible mask seams in the latents, and the whole picture seems to be discolored (but that could be TAESD): Final image with visible seams: SDXL inpainting is a lot worse with discoloration: But again, no visible seams in the preview, everything is "equally" discolored:

Dec 07 '23 16:12 23pennies

that's interesting - can you try in img2img advanced -> disable full quality - that basically forces usage of taesd for final decode as well.

Dec 07 '23 16:12 vladmandic

Sure:

Dec 07 '23 17:12 23pennies

So it's not a VAE thing, thus must be diffusers postprocessing?

Dec 08 '23 00:12 vladmandic

So it's not a VAE thing, thus must be diffusers postprocessing?

I don't know for sure if the VAE is involved, but the diffusers are definitely not doing it, since I just found the culprit in the UI: https://github.com/vladmandic/automatic/blob/69bda18e239a8b4d7b9a3a2a7fd450f69351cbae/modules/processing.py#L940C38-L940C38 I added output_images.append(image) before this line and got a grid with the unprocessed and the processed result: This should be useful as a setting. The full image discoloration can be jarring, but might be preferable over the mask seams, and might be easier to fix with an image editing program.

Dec 13 '23 21:12 23pennies

This should be useful as a setting. The full image discoloration can be jarring, but might be preferable over the mask seams, and might be easier to fix with an image editing program.

good point, i'll add it.

Dec 13 '23 22:12 vladmandic

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jan 09 '24 15:01 github-actions[bot]

push

Jan 15 '24 13:01 castortroy83

@castortroy83 we added two auto1111 features https://github.com/huggingface/diffusers/pull/6072 here that will help with the inpainting generation and mask edge issue

Jan 15 '24 17:01 yiyixuxu

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Feb 09 '24 15:02 github-actions[bot]

Hope this is solved now?

Feb 18 '24 15:02 sayakpaul

Sorry, while the part on the UI was a workaround that's better than nothing, it doesn't fix the underlying issue of discoloration. I didn't want to push this further, as I thought it's just how inpainting works. But I got into trying out ComfyUI and it does inpainting almost perfectly. For comparison, here is the result with Diffusers on SD.Next, everything updated to the latest version. Denoising at 0.99, with the sdxl 0.1 inpainting model: inpaintnewdiffusers And here is the ComfyUI result: inpaintnewcomfy The inpainting is nearly perfect and there is almost no color shifting at all. It tells me that it's possible and should be worthwhile to pursuit a proper fix for this.

Feb 27 '24 11:02 23pennies

Are the models same for your tests? If so, Ccing @patil-suraj @yiyixuxu here.

Cc: @vladmandic as well.

Feb 27 '24 11:02 sayakpaul

Same model, same sampler, same denoising. Another pointer to a possible cause/solution: In ComfyUI, the nodes for the above output look like this: goodoutput For curiosity's sake, I tried giving the sampler a separately decoded latent, instead of the one from the Inpaint Conditioning node: badoutput The result: comfybadcolors Similar discoloration.

Feb 27 '24 12:02 23pennies

Alright. Could you maybe provide your diffusers code snippet?

Also

For curiosity's sake, I tried giving the sampler a separately decoded latent, instead of the one from the Inpaint Conditioning node:

Could you expand a bit more on this?

Feb 27 '24 13:02 sayakpaul

I did some tests, the image you're using is not a 1024x1024 so I upscaled it to test the difference between them and I don't see that much difference with the comfyui results:

Edit: I was comparing to the "bad" results of comfyui, I get what you mean now. I'll dig deeper into this.

Normal SDXL

source	inpaiting	diff

Inpainting SDXL

source	inpaiting	diff

Inpainting SDXL (blurred mask)

source	inpaiting	diff

Feb 27 '24 15:02 asomoza

I tested it more and the difference was that the comfyui uses by default the "only inpaint mask option" so it only affects the area around the mask. With this code:

  image= pipe(
      prompt,
      image=base,
      mask_image=mask_blurred,
      guidance_scale=8,
      strength=0.99,
      num_inference_steps=20,
      generator=generator,
      padding_mask_crop=32,
  ).images[0]

The results are the same as comfyui:

Feb 27 '24 18:02 asomoza

@23pennies does the comment above from @asomoza help?

Feb 28 '24 00:02 sayakpaul

@asomoza Could you say which variable in that snippet is for the "only inpaint mask option"? Also, which model were you using?

Feb 29 '24 10:02 23pennies

@asomoza Could you say which variable in that snippet is for the "only inpaint mask option"? Also, which model were you using?

padding_mask_crop=32

https://huggingface.co/docs/diffusers/using-diffusers/inpaint#padding-mask-crop

and I tested it with the inpainting model which seems to "decolorize" the image more than the normal one.

https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1

Feb 29 '24 16:02 asomoza

I'm using SD.Next, it doesn't look like it's implementing it. I've tried hacking it in myself, and the discoloration still happens: inpaintPadding I then tried hard-coding the arguments so they're as close to yours as possible:

        output = shared.sd_model(
            "necklace",
            image=p.init_images[0],
            mask_image=p.image_mask,
            guidance_scale=8,
            strength=0.99,
            num_inference_steps=20,
            generator=torch.Generator(device="cuda").manual_seed(0),
            padding_mask_crop=32,
        )

And the results are still discolored: inpaintpaddinghardcoded However, you mentioned blurring. I blurred the mask (manually this time): blurredmask and the results are overall less discolored: inpaintpaddinghardcodedandblurredmask But this still doesn't seem to be the solution. The discoloration is still there sometimes, and the blurred mask adds additional problems. You can see in the above example where two buttons are on top of each other, the lower one is faded out. That's the pipeline blending the result with the original image. The non-blended result is this: inpaintpaddinghardcodedandblurredmaskbutnooverlay In ComfyUI this doesn't happen, as I get nearly perfect results without blurring the mask. Also, you said