diffusers
diffusers copied to clipboard
Inpainting produces results that are uneven with input image
Describe the bug
SD inpainting works fine only if mask is absolutely perfect.
Otherwise, there are always visible seams at the edge of the mask, and uneven colors between inpainted and input image.
I've tried manually assembling latents and using them for image
and mask_image
instead of images as well as manually assembling entire masked_image_latent
- results are the same so i left reproduction as simple as possible.
Same behavior is visible in SD and SD-XL pipelines, using base model as well as dedicated inpainting models.
Non-diffuser inpainting implementations such as legacy A1111 implementation does not have this issue.
i've attached a very simple reproduction code that
- generates image
- creates mask as a square in the middle of the image.
- runs inpainting pipeline
Reproduction
import torch
import diffusers
from PIL import Image
model = '/mnt/f/Models/stable-diffusion/cyan-2.5d-v1.safetensors'
pipe0 = diffusers.StableDiffusionPipeline.from_single_file(model).to('cuda')
print(pipe0.__class__.__name__)
base = pipe0('seashore').images[0]
print(base)
mask = Image.new('L', base.size, 0)
square = Image.new('L', (base.size[0]//2, base.size[1]//2), 255)
mask.paste(square, (base.size[0]//4, base.size[1]//4))
print(mask)
pipe1 = diffusers.AutoPipelineForInpainting.from_pipe(pipe0).to('cuda')
print(pipe1.__class__.__name__)
inpaint = pipe1('house', image=base, mask_image=mask, strength=0.75).images[0]
print(inpaint)
base.save('base.png')
mask.save('mask.png')
inpaint.save('inpaint.png')
Logs
No response
System Info
diffusers==0.23.0
Who can help?
@patrickvonplaten @yiyixuxu @DN6 @sayakpaul
Examples
note: issue was originally reported at https://github.com/vladmandic/automatic/issues/2501 which you can check for additional examples.
Thanks for the clean issue here! @yiyixuxu can you have a look?
hi @vladmandic:
I'm trying to compare with auto1111 but I'm seeing same issue - can you tell me if there is anything wrong with my setting?
Played around with it a little bit more. I think the "mask blur" option helps with this issue. I will look into adding this in diffusers.
it is still not perfect though, let me know if there is anything else that I missed, I'm pretty new to auto1111 so it will help a lot if you can point me to the correct settings
mask blur = 0
mask blur = 32
I think mask blur is really good at "hiding" the issue with i paint, it would be a welcome addition to diffusers. Underlying problem still exists, but really unsure how else to address it.
@vladmandic
ok, I will add the mask blur! Agree that it does not seem to resolve the issue completely - but it seems like the underlying issue exists in both diffusers and auto1111, no? just want to make sure so that I don't waste more time digging into auto1111's code base
I'll dig into it more, you can focus on mask blur. If I find something else I'll update here.
I have had the time to test this out further and it looks like it's indeed very similar between Diffusers and the original backend - but not the same.
The best way to test this is an image with contrast, and the mask covering more than one object, like background/foreground/clothing and such.
Tests were done using @vladmandic's UI.
My test image:
The mask (no blur applied):
Results with the original backend (assuming it's the same as auto1111):
Results with the diffusers backend:
In both, the hue of the shirt is changed slightly, but I think it's (maybe subjectively) worse in diffusers. However, if we zoom in at the shoulder:
In the original, the fuzzy background color is almost untouched.
In the diffusers example, the fuzzy background color is getting almost the same treatment as the shirt, getting desaturated in what looks like an identical amount to the shirt.
I think a bit of a color shift is expected, maybe because of latent encoding, maybe because the model only "knows" certain colors or shades. So you would get different results with different colors, objects, and so on. But diffusers adds a plain discoloration to the image.
I also saw that the preview of the generation process hints at a difference:
Original backend only adds noise to the masked area, and denoises it. Diffusers seems to add noise to the entire image, and then reconstructs it somehow? Maybe during that reconstruction, color information gets lost.
One thing to consider: When using auto1111 or derivative UIs, inpainting always pastes the generated image on top of the original with the mask applied. My theory is that the discoloration actually affects the entire generated image, and is only seen as a "border" in the result because of the post processing. Maybe @vladmandic could add a setting to his UI that outputs the unprocessed result, for debug purposes?
inpainting always pastes the generated image on top of the original with the mask applied.
there is no such thing, all "magic" happens in preprocessing. the difference in live preview is likely due to "mask only" vs "full image".
If you're referring to "Inpaint area", I always use the "Whole picture" option.
Using the TAESD live preview method, I can see no visible mask seams in the latents, and the whole picture seems to be discolored (but that could be TAESD):
Final image with visible seams:
SDXL inpainting is a lot worse with discoloration:
But again, no visible seams in the preview, everything is "equally" discolored:
that's interesting - can you try in img2img advanced -> disable full quality - that basically forces usage of taesd for final decode as well.
Sure:
So it's not a VAE thing, thus must be diffusers postprocessing?
So it's not a VAE thing, thus must be diffusers postprocessing?
I don't know for sure if the VAE is involved, but the diffusers are definitely not doing it, since I just found the culprit in the UI:
https://github.com/vladmandic/automatic/blob/69bda18e239a8b4d7b9a3a2a7fd450f69351cbae/modules/processing.py#L940C38-L940C38
I added output_images.append(image)
before this line and got a grid with the unprocessed and the processed result:
This should be useful as a setting. The full image discoloration can be jarring, but might be preferable over the mask seams, and might be easier to fix with an image editing program.
This should be useful as a setting. The full image discoloration can be jarring, but might be preferable over the mask seams, and might be easier to fix with an image editing program.
good point, i'll add it.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
push
@castortroy83 we added two auto1111 features https://github.com/huggingface/diffusers/pull/6072 here that will help with the inpainting generation and mask edge issue
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hope this is solved now?
Sorry, while the part on the UI was a workaround that's better than nothing, it doesn't fix the underlying issue of discoloration. I didn't want to push this further, as I thought it's just how inpainting works. But I got into trying out ComfyUI and it does inpainting almost perfectly. For comparison, here is the result with Diffusers on SD.Next, everything updated to the latest version. Denoising at 0.99, with the sdxl 0.1 inpainting model:
And here is the ComfyUI result:
The inpainting is nearly perfect and there is almost no color shifting at all. It tells me that it's possible and should be worthwhile to pursuit a proper fix for this.
Are the models same for your tests? If so, Ccing @patil-suraj @yiyixuxu here.
Cc: @vladmandic as well.
Same model, same sampler, same denoising.
Another pointer to a possible cause/solution: In ComfyUI, the nodes for the above output look like this:
For curiosity's sake, I tried giving the sampler a separately decoded latent, instead of the one from the Inpaint Conditioning node:
The result:
Similar discoloration.
Alright. Could you maybe provide your diffusers
code snippet?
Also
For curiosity's sake, I tried giving the sampler a separately decoded latent, instead of the one from the Inpaint Conditioning node:
Could you expand a bit more on this?
I did some tests, the image you're using is not a 1024x1024 so I upscaled it to test the difference between them and I don't see that much difference with the comfyui results:
Edit: I was comparing to the "bad" results of comfyui, I get what you mean now. I'll dig deeper into this.
Normal SDXL
source | inpaiting | diff |
---|---|---|
Inpainting SDXL
source | inpaiting | diff |
---|---|---|
Inpainting SDXL (blurred mask)
source | inpaiting | diff |
---|---|---|
I tested it more and the difference was that the comfyui uses by default the "only inpaint mask option" so it only affects the area around the mask. With this code:
image= pipe(
prompt,
image=base,
mask_image=mask_blurred,
guidance_scale=8,
strength=0.99,
num_inference_steps=20,
generator=generator,
padding_mask_crop=32,
).images[0]
The results are the same as comfyui:
@23pennies does the comment above from @asomoza help?
@asomoza Could you say which variable in that snippet is for the "only inpaint mask option"? Also, which model were you using?
@asomoza Could you say which variable in that snippet is for the "only inpaint mask option"? Also, which model were you using?
padding_mask_crop=32
https://huggingface.co/docs/diffusers/using-diffusers/inpaint#padding-mask-crop
and I tested it with the inpainting model which seems to "decolorize" the image more than the normal one.
https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1
I'm using SD.Next, it doesn't look like it's implementing it.
I've tried hacking it in myself, and the discoloration still happens:
I then tried hard-coding the arguments so they're as close to yours as possible:
output = shared.sd_model(
"necklace",
image=p.init_images[0],
mask_image=p.image_mask,
guidance_scale=8,
strength=0.99,
num_inference_steps=20,
generator=torch.Generator(device="cuda").manual_seed(0),
padding_mask_crop=32,
)
And the results are still discolored:
However, you mentioned blurring. I blurred the mask (manually this time):
and the results are overall less discolored:
But this still doesn't seem to be the solution. The discoloration is still there sometimes, and the blurred mask adds additional problems. You can see in the above example where two buttons are on top of each other, the lower one is faded out. That's the pipeline blending the result with the original image. The non-blended result is this:
In ComfyUI this doesn't happen, as I get nearly perfect results without blurring the mask.
Also, you said
I tested it more and the difference was that the comfyui uses by default the "only inpaint mask option" so it only affects the area around the mask.
Could you point me to where you found this? My understanding of how ComfyUI works doesn't align with it.