stable-diffusion.cpp icon indicating copy to clipboard operation
stable-diffusion.cpp copied to clipboard

Support Inpainting

Open 10undertiber opened this issue 1 year ago • 8 comments

It would be great to add input parameters to the current SD cli to specify an input and mask file to run the inpainting. For example:

./bin/sd -m ../models/sd-v1-4.ckpt -p "a lovely dog" --image ../input/alovelybench.png --mask ../input/alovelybench.mask.png

The input image: alovelybench

The input mask: alovelybench mask

The output: alovelybench output

Here some references:

  1. https://stable-diffusion-art.com/inpainting_basics
  2. https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/inpaint

10undertiber avatar Dec 05 '23 16:12 10undertiber

@leejet I believe that is done by adding noise only to the white part of the latent image, and in the decoder, keeping the pixels of the black part unchanged. However, the image-to-image mode is also causing quality issues, the images appear overly smoothed, blurred and distorted. Even with a strength setting of 0.05, the final image bears little resemblance to the original.

FSSRepo avatar Dec 05 '23 16:12 FSSRepo

@leejet I believe that is done by adding noise only to the white part of the latent image, and in the decoder, keeping the pixels of the black part unchanged. However, the image-to-image mode is also causing quality issues, the images appear overly smoothed, blurred and distorted. Even with a strength setting of 0.05, the final image bears little resemblance to the original.

It seems that img2img has some issues and the results are inconsistent with sd-webui.

leejet avatar Dec 06 '23 02:12 leejet

@leejet I think we should first solve that problem before considering adding the inpainting feature.

Inpainting models require a latent image with 9 input channels, 4 for the usual channels, 4 more for the latent noise with the applied mask, and 1 for the mask. There may also be a need for a slight modification in the autoencoder, but I will continue researching.

FSSRepo avatar Dec 06 '23 03:12 FSSRepo

@leejet I think we should first solve that problem before considering adding the inpainting feature.

Inpainting models require a latent image with 9 input channels, 4 for the usual channels, 4 more for the latent noise with the applied mask, and 1 for the mask. There may also be a need for a slight modification in the autoencoder, but I will continue researching.

Yes, the first step is to fit the inpaint model. We can determine whether the weight currently loaded is the weight of the inpaint model according to the shape of the weight.

leejet avatar Dec 06 '23 03:12 leejet

hope someone makes a cute gui for it as inpainting will be much easier to just mask it through the gui, not to mention cpp with gui will be great

Amin456789 avatar Dec 08 '23 20:12 Amin456789

@FSSRepo i see u r working on a webui [cant wait for it], if possible please add outpainting as well, it will be great to have it, also i have a question, i have dark reader on my browser, will the dark reader make ur webui's background dark too? as working with white background in the night is very hard imo

Amin456789 avatar Dec 24 '23 09:12 Amin456789

Would it be possible to implement a simple form of inpainting, where the user specifies a rectangular region using command line parameters? And only this region of the input image would be changed. For example, the user could pass four integers (x,y,height,width) which define the top left corner and the dimensions of the rectangular region.

programmbauer avatar May 05 '24 20:05 programmbauer

@leejet I'm pretty interested in fixing img2img and adding inpainting; do you have any pointers as to why it's currently not matching stable diffusion webui?

balisujohn avatar Jul 31 '24 05:07 balisujohn

Does this project take bug bounties? I am willing to put money down to make this happen.

msglm avatar Dec 01 '24 16:12 msglm

@aagdev could up please implent inpaintint here, i saw u added to ur project in mlimgsynth, thank u

Amin456789 avatar Dec 03 '24 09:12 Amin456789

I think I got it. PR incoming soon-ish

stduhpf avatar Dec 04 '24 20:12 stduhpf

@msglm If you were serious about the bounty, lmk 😅

stduhpf avatar Dec 05 '24 17:12 stduhpf

@msglm If you were serious about the bounty, lmk 😅

Yeah, I'm looking to put some money down to make it happen, main concerns are:

  • Inpainting works with SDXL models (my primary example being pony diffusion)
  • The system works with no artifacting, absurd slowdowns, or other such jank on both ROCM and Vulkan
  • The interface for doing this just requires passing in an image of some kind (In code, it should be easy to implement into guis so projects like https://github.com/fszontagh/sd.cpp.gui.wx can take advantage of it).
  • In general, it should feel like how it does on the cli-style python-based implementations of Stable Diffusion
  • The project should not have any major changes to it so it can compile on GNU Guix (this means no needing to pre-generate anything to get it to work like with the Vulkan backend)

There's probably some other stuff as well that may get brought up during development, but if it works on my machine im willing to pay for it. I usually pay in XMR, but I could do BTC or some other payment method. Alternatively, some kind of issue-based bug bounty method where issues get funding that's released upon completion would work.

Long-term, my goal is to have an easy to compile, single binary application for working in stable-diffusion as a good workflow. This is a step in that.

msglm avatar Dec 06 '24 13:12 msglm

  • it works with SDXL models
  • It works with Vulkan (I can't get ROCm to work at all with my old GPU), other backends should work too
  • Performance is OK, only noticable slowdown is because of the repeated VAE encoding (once for the base image, and once again for the masked image)
  • Interface is simple enough I think (same as img2img mode with the --mask argument to point to the image mask path)
  • No extra dependencies of weird code patterns, so it should work on any system that already support sdcpp

I don't have any crypto wallet though 😔

stduhpf avatar Dec 06 '24 17:12 stduhpf

@stduhpf A wallet costs nothing. :smirk:

Green-Sky avatar Dec 07 '24 18:12 Green-Sky

@stduhpf A wallet costs nothing. 😏

Good point.

stduhpf avatar Dec 07 '24 19:12 stduhpf

@msglm Now that it's merged.... 42CcDxbASzWQe5hAryPffZZtwWVmSm1oqdSKwa87hTENBWf1dwHUWLD6wQ1pKtz2ejC3oqZBrwXyzQNzRBmnC9kV6VH9F92 image (Don't feel obligated)

stduhpf avatar Dec 30 '24 15:12 stduhpf