ml-stable-diffusion Inpainting affects non-transparent parts of the image

Hi! :) I'm testing the new inpainting functionality that has recently been pushed to the main branch.

I'm using the Stable Diffusion 1.5 model converted with this command:

python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-vae-encoder --convert-safety-checker --model-version "runwayml/stable-diffusion-v1-5" --unet-support-controlnet --quantize-nbits 6 --attention-implementation SPLIT_EINSUM_V2 --convert-controlnet "lllyasviel/sd-controlnet-canny" --bundle-resources-for-swift-cli -o "/path/to/save"

and the already converted InPaint-SE model from here.

I'm also using macOS Preview to erase all the image content except my face to transparent, like so:

The resulting image kinda uses my face, but messes it up, while I was expecting the face to remain unchanged.

This is happening on iPadOS using the main branch of this package, and also on the latest version of MochiDiffusion.

I don't think that it's intended. In Automatic1111, when using InPaint + Canny I get good results where the face remains unchanged .

Jul 17 '23 09:07 SaladDays831

That's strange. In my experiment, it worked fine. Screenshot 2023-07-17 at 18 33 58

Jul 17 '23 09:07 ynagatomo

Hmm, thanks @ynagatomo, will try to convert the same inpaint model you use myself

Jul 17 '23 10:07 SaladDays831

No changes with the newly converted model Tested with just inpainting the face (instead of everything but the face) - it works ok-ish. I still get some noise/corruption outside the inpainted area (your example also has some minor color changes). Maybe it's not that visible in your example because it's not a photo but a painting? 🤔

When using the Automatic1111 WebUI and inpainting everything except the face (like in my example) - the face remains unchanged. In cases like these, even the slightest deformation on the person's face will result in a total mess :(

Jul 17 '23 12:07 SaladDays831

at least, the masking feature for InPainting added by the PR is working. We may need to adjust the parameters and models. :)

Jul 17 '23 12:07 ynagatomo

I think the process may be sensitive to the base model being used, for some reason. When I use a given base model to generate the input image, and then that same base model (and the same seed when possible) for the ControlNet inpaint run, I get many fewer anomalies. I don't understand why that could be, but it seems to be that way for me.

Jul 17 '23 20:07 jrittvo

Hey @SaladDays831! I checked out A1111's in-painting UI after seeing this issue. There are a lot of additional knobs that are built around the core in-painting functionality in order to make it work better for certain use cases. Some examples for these knobs are:

Masked only vs whole picture mode (masked only zooms into the region to preserve details better)
Mask blur (for blending)
Mask padding (dilation) None of this is implemented in our ControlNet support today but I expect we will gradually support some of this through PRs.

Jul 18 '23 04:07 atiorh

Hi @atiorh :) Thanks for looking into this!

I didn't thoroughly test the difference, but there are two ways to do inpainting in A1111. All the settings you mentioned are present in the img2img -> inpaint tab (and you don't need a CN model for that from what I see)

For my tests, I just used the imported inpainting model in the ControlNet section of the txt2img tab, which looks like the "core" inpainting functionality. It doesn't have all these fancy settings + I can test the same model version I try to use with this package, and it works as expected (doesn't change the un-inpainted parts at all)

Jul 18 '23 08:07 SaladDays831

Hi, I tried to add a Starting Image in Inpaint with SD1.5_cn, but it seems to have no effect and does not influence the resulting output image. I'm not sure if this is the correct behavior.

Aug 02 '23 02:08 TimYao18

What commands or app are you using. You need to provide some details here before anyone can begin to help. Does your starting image have an area that is transparent to indicate what area is to be inpainted?

Aug 02 '23 02:08 jrittvo

I use swift diffusers and MochiDiffusion both. I just tried the Swift CLI and the starting image has no effect to the result, too.

Perhaps I didn't make myself clear. What I meant is that the results remain the same whether I include the Starting Image or not.

The images are as below. starting image masked image as controlnet input

Aug 02 '23 02:08 TimYao18

At the moment the InPaint ControlNet is broken in Mochi Diffusion. At least half the time, it is ignoring the masked input image. I have a build that appears to fix the problem, but I don't know if my builds can run on other people's machines because it is not an Apple notarized app. If you would like to try it, this is the download link: https://huggingface.co/jrrjrr/Playground/blob/main/Mochi%20Diffusion%20(macOS%2013).dmg

When you say "Starting Image", does that mean that you trying to use 2 images? A masked image to define the inpaint area and a second image that you want to have fill the masked area? Can you explain a little more how you are setting it all up in either of your two methods?

Aug 02 '23 02:08 jrittvo

Swift CLI, for ControlNet InPaint, it only uses the --controlnet-inputs. You can't also use the --image argument. The --image argument is for Image2Image.

This is the command I use (with my paths) for ControlNet:

swift run StableDiffusionSample "a photo of a cat" --seed 12 --guidance-scale 8.0 --step-count 24 --image-count 1 --scheduler dpmpp --compute-units cpuAndGPU --resource-path ../models/sd-5x7 --controlnet InPaint-5x7 --controlnet-inputs ../input/cat-5x7.png --output-path ../images

Aug 02 '23 02:08 jrittvo

I set 2 images as the MochiDiffusion screenshot here

The starting image is defined in the PipelineConfiguration: /// Starting image for image2image or in-painting public var startingImage: CGImage? = nil

I don't know if inpaint need the starting image, and I think inpaint might reference to the starting image to fix something.

I apologize for causing some confusion.

Aug 02 '23 03:08 TimYao18

ControlNet InPaint in Mochi only uses one input image. The masked image. The text prompt tells what to put in the masked area. The upper spot for an input image in Mochi only gets used with Image2Image. It has no effect on ControlNet.

This is with my test build. Remember, the build that downloads from the Mochi GitHub is presently broken for most ControlNets.

Screencap

Aug 02 '23 03:08 jrittvo

And yes, this is all very confusing because it is not explained well with visual examples anywhere. That is something Mochi needs to improve on.

Aug 02 '23 03:08 jrittvo

In this example, everything is masked except the face. The text prompt tells to use a "suit of armor" where there is mask.

Aug 02 '23 03:08 jrittvo

Masked image mask-blouse-5x5

Prompt: Woman in flower print blouse Woman with flower print blouse 10 1371478925

Aug 02 '23 03:08 jrittvo

When I have used it in Swift CLI, it is the same inputs and logic. The python CLI pipeline may be different.

Aug 02 '23 03:08 jrittvo

ml-stable-diffusion ml-stable-diffusion copied to clipboard

Inpainting affects non-transparent parts of the image

ml-stable-diffusion
ml-stable-diffusion copied to clipboard