ComfyUI
ComfyUI copied to clipboard
Inpainting models barely apply any change even at denoise 1
When using an inpainting model there's almost no change happening to the masked region, even with a denoise value of 1. However, when using the same flow with a regular model it seems to work.
With inpainting model:
With regular model:
Inpaint models need the "VAE Encode for inpaint" node or else it does that.
But with "VAE Encode for inpaint" it's not usable with low denoise values; as it adds a grey fill to the masked region
That grey fill is what the inpaint model expects.
This is how it looks with denoise 0.5:
Is there a way to replicate A1111's inpaint with an inpainting model and fill method "original?
Is there a way to replicate A1111's inpaint with an inpainting model and fill method "original?
You have to use SetLatentNoise instead of VAEEncode (for inpaint)
Is there a way to replicate A1111's inpaint with an inpainting model and fill method "original?
You have to use SetLatentNoise instead of VAEEncode (for inpaint)
@ltdrdata VAEEncode should be used with inpainting models accourding to @comfyanonymous: https://github.com/comfyanonymous/ComfyUI/issues/1186#issuecomment-1674105576
Also, I used SetLatentNoise in my first comment (check the screenshots) https://github.com/comfyanonymous/ComfyUI/issues/1186#issue-1846063953
If you want to use any of the original image, im fairly sure you need to make use of the setlatentnoise method and normal model rather than inpaint model. coupla tutorials that might help ya out I made. https://youtu.be/g9JXx4ik_rc - basic masking stuff https://youtu.be/Yc13jDr9VRk - advanced masking and compositing stuff
I've already tried the proposed workflows, however, the reason I opened this issue is that there is a missing functionality when using inpainting models; having a denoise value lower than 1 while using the original pixel values of the input image, this is one of the main benefits of using an inpainting model.
Here's an example from A1111:
As you can see the consistency is much better than using a denoise value of 1 with "VAE Encode for inpaint" and an inpainting model.
Did you find a solution for this, that damn gray also affects the color of inpainting if the area is large.
Did you find a solution for this, that damn gray also affects the color of inpainting if the area is large.
No, I was hoping I would get an answer here. I believe it's an issue/missing feature.
did you try this https://github.com/comfyanonymous/ComfyUI/issues/668
did you try this #668
Yes, but it doesn't seem to have a noticeable effect with inpainting models
I also had the same problem. Even if a month has passed, I will leave my observations here, maybe it will be useful to someone. The inpaint model really doesn't work the same way as in A1111. It is necessary to use VAE Encode (for inpainting) and select the mask exactly along the edges of the object. In the first example (Denoise Strength 0.71), I selected only the lips, and the model repainted them green, almost leaving a slight smile of the original image. In the second example (Denoise Strength 0.8) I masked the entire mouth area - and the model drew green lips, but lost the original concept. Set Latent Noise Mask does not work for the Inpaint model.
Sorry for a little mess on a screen :)
Definitely agree that it's an issue and I hope ot gets resolved. Right now, inpaintng in ComfyUI is deeply inferior to A1111, which is letdown..
I'll reiterate: Using "Set Latent Noise Mask" allow you to lower denoising value and get profit from information already on the image(e.g. you sketched something yourself), but when using Inpainting models, even denoising of 1 will give you an image pretty much identical to the original..
When not using Inpainting model - you can use "Set Latent Noise Mask" approach, however, using A1111 with lower denoise value and Inpainting model give much much better results. So, there is a lot of value of allowing us to use Inpainting model with "Set Latent Noise Mask".
The only way to use Inpainting model in ComfyUI right now is to use "VAE Encode (for inpainting)", however, this only works correctly with the denoising value of 1. And that means we can not use underlying image(e.g. sketch stuff ourselves).
What I expect: "Set Latent Noise Mask" works fine with inpainting models, changing image according to denoising value, same as A1111. Or "VAE encode (for inpainting)" with a lower denoising value should use underlying image instead of grey, when using lower denoising value.
What happens: "Set Latent Noise Mask" with inpainting model changes almost nothing with any denoising value(including 1), while "VAE encode (for inpainting)" only works correctly with denoising value of 1, hence making it imposible to use underlying image efficiently or at all.
Does anyone here understand both Comfy and A1111 codebases well enough to why Comfy's inpainting behaviour is so different?
I spent a couple hours digging into how the VAEEncodeForInpaint, SetLatentNoiseMask, and KSampler nodes work, hoping I could submit a patch PR. So far I can see that the denoising/scheduling/masking logic appear to be equivalent, but I can't isolate the offending difference yet. It's gonna take a lot more time to grok without help.
@comfyanonymous any chance you could point us in the right direction?
I've identified a key difference!
In Automatic1111 webui, time elapsed during img2img is directly proportional to denoising strength, i.e.:
- For denoising = 1.0, generation takes X seconds
- For denoising = 0.5, generation takes 0.5X seconds
- For denoising = 0.01, generation takes 0.01X seconds
In ComfyUI, time elapsed is constant regardless of the denoising strength
Now I know what to look for in the codebase! Stay tuned
- For denoising = 1.0, generation takes X seconds
- For denoising = 0.5, generation takes 0.5X seconds
- For denoising = 0.01, generation takes 0.01X seconds
The A1111 uses less steps proportional to the denoising, that is why it is faster with less denoising, there is a configuration in A1111 that disables it, to have the same in ComfyUI just use less steps.
то же самое в ComfyUI, просто используйте меньше шагов
It won't work. At 1-2 steps it will be low quality, but at 3 you will already see the original.
Bump, no update on this issue? From my understanding Inpainting models use a mask as an extra input, the one fed to the inpainting model is wrong for some reason when using Masked Latent Node. Resulting in the inpainting model using the fill data as a reference to inpaint (and as the fill is actually decent see no reason to change anything).
This is a huge limitation especially when trying to use Lama as a preprocessor to get a rough fill before using the inpaint model.
I haven't fully tested it yet but it seems I can get around this issue by using a normal vaeencode on the original image along with the vaeencode for inpainting and doing a latentblend.
@JoeNavark and with 50% blend you get roughly what you could have expected from 0.5 denoise? (more or less?).
This does somewhat work, you need to add an extra latent mask or it just inpaint everything. Still some feathering/edge issues to solve but it appears to be doing what we want.
That's weird, I didn't have to add the second mask as long as the masked encode went to samples1.
I may have switched samples1 and 2 before doing the screenshot to clean up so it might be because of that.
I solved this problem by modifying some code. The Inpainting model requires receiving "noisy images" and "masked images" as inputs. WEBUI and ComfyUI use different processing methods, as shown in the following figure.
To solve this problem, the code in four places of the three files needs to be modified.
- nodes.VAEEncodForInpaint.encode() , need to output "latent_image" and "masked_ latent_ image"
- nodes.common_ksampler(), need to add "masked_latent_image" to positive and negative condition
- comfy.sampler.sample(), do model.process_latent_in() on the "masked_latent_image"
- comfy.model_base.BaseModel.extra_conds(), use "masked_latent_image" instead of "latent_image" as element of "cond_concat"
If the project manager agrees to this plan, I would be happy to submit a PR to fix this issue.
I solved this problem by modifying some code. The Inpainting model requires receiving "noisy images" and "masked images" as inputs. WEBUI and ComfyUI use different processing methods, as shown in the following figure.
To solve this problem, the code in four places of the three files needs to be modified.
- nodes.VAEEncodForInpaint.encode() , need to output "latent_image" and "masked_ latent_ image"
- nodes.common_ksampler(), need to add "masked_latent_image" to positive and negative condition
- comfy.sampler.sample(), do model.process_latent_in() on the "masked_latent_image"
- comfy.model_base.BaseModel.extra_conds(), use "masked_latent_image" instead of "latent_image" as element of "cond_concat"
If the project manager agrees to this plan, I would be happy to submit a PR to fix this issue.
Finally, an analysis has been conducted to identify the implementation differences. Good job!
Thanks a lot, supporting the idea of the PR. Indeed the workaround is still not as good as a true partial denoise because the latent blend appears to mess up with noise offset resulting in some tint missmatch.
I submitted a PR #2501