ml-stable-diffusion
ml-stable-diffusion copied to clipboard
Inpainting affects non-transparent parts of the image
Hi! :) I'm testing the new inpainting functionality that has recently been pushed to the main branch.
I'm using the Stable Diffusion 1.5 model converted with this command:
python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-vae-encoder --convert-safety-checker --model-version "runwayml/stable-diffusion-v1-5" --unet-support-controlnet --quantize-nbits 6 --attention-implementation SPLIT_EINSUM_V2 --convert-controlnet "lllyasviel/sd-controlnet-canny" --bundle-resources-for-swift-cli -o "/path/to/save"
and the already converted InPaint-SE model from here.
I'm also using macOS Preview to erase all the image content except my face to transparent, like so:
The resulting image kinda uses my face, but messes it up, while I was expecting the face to remain unchanged.
This is happening on iPadOS using the main branch of this package, and also on the latest version of MochiDiffusion.
I don't think that it's intended. In Automatic1111, when using InPaint + Canny I get good results where the face remains unchanged
.
That's strange. In my experiment, it worked fine.
Hmm, thanks @ynagatomo, will try to convert the same inpaint model you use myself
No changes with the newly converted model
Tested with just inpainting the face (instead of everything but the face) - it works ok-ish. I still get some noise/corruption outside the inpainted area (your example also has some minor color changes). Maybe it's not that visible in your example because it's not a photo but a painting? 🤔
When using the Automatic1111 WebUI and inpainting everything except the face (like in my example) - the face remains unchanged. In cases like these, even the slightest deformation on the person's face will result in a total mess :(
at least, the masking feature for InPainting added by the PR is working. We may need to adjust the parameters and models. :)
I think the process may be sensitive to the base model being used, for some reason. When I use a given base model to generate the input image, and then that same base model (and the same seed when possible) for the ControlNet inpaint run, I get many fewer anomalies. I don't understand why that could be, but it seems to be that way for me.
Hey @SaladDays831! I checked out A1111's in-painting UI after seeing this issue. There are a lot of additional knobs that are built around the core in-painting functionality in order to make it work better for certain use cases. Some examples for these knobs are:
- Masked only vs whole picture mode (masked only zooms into the region to preserve details better)
- Mask blur (for blending)
- Mask padding (dilation) None of this is implemented in our ControlNet support today but I expect we will gradually support some of this through PRs.
Hi @atiorh :) Thanks for looking into this!
I didn't thoroughly test the difference, but there are two ways to do inpainting in A1111. All the settings you mentioned are present in the img2img -> inpaint tab (and you don't need a CN model for that from what I see)
For my tests, I just used the imported inpainting model in the ControlNet section of the txt2img tab, which looks like the "core" inpainting functionality. It doesn't have all these fancy settings + I can test the same model version I try to use with this package, and it works as expected (doesn't change the un-inpainted parts at all)
Hi, I tried to add a Starting Image in Inpaint with SD1.5_cn, but it seems to have no effect and does not influence the resulting output image. I'm not sure if this is the correct behavior.
What commands or app are you using. You need to provide some details here before anyone can begin to help. Does your starting image have an area that is transparent to indicate what area is to be inpainted?
I use swift diffusers and MochiDiffusion both. I just tried the Swift CLI and the starting image has no effect to the result, too.
Perhaps I didn't make myself clear. What I meant is that the results remain the same whether I include the Starting Image or not.
The images are as below. starting image masked image as controlnet input
At the moment the InPaint ControlNet is broken in Mochi Diffusion. At least half the time, it is ignoring the masked input image. I have a build that appears to fix the problem, but I don't know if my builds can run on other people's machines because it is not an Apple notarized app. If you would like to try it, this is the download link: https://huggingface.co/jrrjrr/Playground/blob/main/Mochi%20Diffusion%20(macOS%2013).dmg
When you say "Starting Image", does that mean that you trying to use 2 images? A masked image to define the inpaint area and a second image that you want to have fill the masked area? Can you explain a little more how you are setting it all up in either of your two methods?
Swift CLI, for ControlNet InPaint, it only uses the --controlnet-inputs. You can't also use the --image argument. The --image argument is for Image2Image.
This is the command I use (with my paths) for ControlNet:
swift run StableDiffusionSample "a photo of a cat" --seed 12 --guidance-scale 8.0 --step-count 24 --image-count 1 --scheduler dpmpp --compute-units cpuAndGPU --resource-path ../models/sd-5x7 --controlnet InPaint-5x7 --controlnet-inputs ../input/cat-5x7.png --output-path ../images
I set 2 images as the MochiDiffusion screenshot here
The starting image is defined in the PipelineConfiguration: /// Starting image for image2image or in-painting public var startingImage: CGImage? = nil
I don't know if inpaint need the starting image, and I think inpaint might reference to the starting image to fix something.
I apologize for causing some confusion.
ControlNet InPaint in Mochi only uses one input image. The masked image. The text prompt tells what to put in the masked area. The upper spot for an input image in Mochi only gets used with Image2Image. It has no effect on ControlNet.
This is with my test build. Remember, the build that downloads from the Mochi GitHub is presently broken for most ControlNets.
And yes, this is all very confusing because it is not explained well with visual examples anywhere. That is something Mochi needs to improve on.
In this example, everything is masked except the face. The text prompt tells to use a "suit of armor" where there is mask.
Masked image
Prompt: Woman in flower print blouse
When I have used it in Swift CLI, it is the same inputs and logic. The python CLI pipeline may be different.