stable-diffusion.cpp
stable-diffusion.cpp copied to clipboard
Support Inpainting
It would be great to add input parameters to the current SD cli to specify an input and mask file to run the inpainting. For example:
./bin/sd -m ../models/sd-v1-4.ckpt -p "a lovely dog" --image ../input/alovelybench.png --mask ../input/alovelybench.mask.png
The input image:
The input mask:
The output:
Here some references:
- https://stable-diffusion-art.com/inpainting_basics
- https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/inpaint
@leejet I believe that is done by adding noise only to the white part of the latent image, and in the decoder, keeping the pixels of the black part unchanged. However, the image-to-image mode is also causing quality issues, the images appear overly smoothed, blurred and distorted. Even with a strength setting of 0.05, the final image bears little resemblance to the original.
@leejet I believe that is done by adding noise only to the white part of the latent image, and in the decoder, keeping the pixels of the black part unchanged. However, the image-to-image mode is also causing quality issues, the images appear overly smoothed, blurred and distorted. Even with a strength setting of 0.05, the final image bears little resemblance to the original.
It seems that img2img has some issues and the results are inconsistent with sd-webui.
@leejet I think we should first solve that problem before considering adding the inpainting feature.
Inpainting models require a latent image with 9 input channels, 4 for the usual channels, 4 more for the latent noise with the applied mask, and 1 for the mask. There may also be a need for a slight modification in the autoencoder, but I will continue researching.
@leejet I think we should first solve that problem before considering adding the inpainting feature.
Inpainting models require a latent image with 9 input channels, 4 for the usual channels, 4 more for the latent noise with the applied mask, and 1 for the mask. There may also be a need for a slight modification in the autoencoder, but I will continue researching.
Yes, the first step is to fit the inpaint model. We can determine whether the weight currently loaded is the weight of the inpaint model according to the shape of the weight.
hope someone makes a cute gui for it as inpainting will be much easier to just mask it through the gui, not to mention cpp with gui will be great
@FSSRepo i see u r working on a webui [cant wait for it], if possible please add outpainting as well, it will be great to have it, also i have a question, i have dark reader on my browser, will the dark reader make ur webui's background dark too? as working with white background in the night is very hard imo
Would it be possible to implement a simple form of inpainting, where the user specifies a rectangular region using command line parameters? And only this region of the input image would be changed. For example, the user could pass four integers (x,y,height,width) which define the top left corner and the dimensions of the rectangular region.
@leejet I'm pretty interested in fixing img2img and adding inpainting; do you have any pointers as to why it's currently not matching stable diffusion webui?
Does this project take bug bounties? I am willing to put money down to make this happen.
@aagdev could up please implent inpaintint here, i saw u added to ur project in mlimgsynth, thank u
I think I got it. PR incoming soon-ish
@msglm If you were serious about the bounty, lmk 😅
@msglm If you were serious about the bounty, lmk 😅
Yeah, I'm looking to put some money down to make it happen, main concerns are:
- Inpainting works with SDXL models (my primary example being pony diffusion)
- The system works with no artifacting, absurd slowdowns, or other such jank on both ROCM and Vulkan
- The interface for doing this just requires passing in an image of some kind (In code, it should be easy to implement into guis so projects like https://github.com/fszontagh/sd.cpp.gui.wx can take advantage of it).
- In general, it should feel like how it does on the cli-style python-based implementations of Stable Diffusion
- The project should not have any major changes to it so it can compile on GNU Guix (this means no needing to pre-generate anything to get it to work like with the Vulkan backend)
There's probably some other stuff as well that may get brought up during development, but if it works on my machine im willing to pay for it. I usually pay in XMR, but I could do BTC or some other payment method. Alternatively, some kind of issue-based bug bounty method where issues get funding that's released upon completion would work.
Long-term, my goal is to have an easy to compile, single binary application for working in stable-diffusion as a good workflow. This is a step in that.
- it works with SDXL models
- It works with Vulkan (I can't get ROCm to work at all with my old GPU), other backends should work too
- Performance is OK, only noticable slowdown is because of the repeated VAE encoding (once for the base image, and once again for the masked image)
- Interface is simple enough I think (same as img2img mode with the
--maskargument to point to the image mask path) - No extra dependencies of weird code patterns, so it should work on any system that already support sdcpp
I don't have any crypto wallet though 😔
@stduhpf A wallet costs nothing. :smirk:
@stduhpf A wallet costs nothing. 😏
Good point.
@msglm
Now that it's merged....
42CcDxbASzWQe5hAryPffZZtwWVmSm1oqdSKwa87hTENBWf1dwHUWLD6wQ1pKtz2ejC3oqZBrwXyzQNzRBmnC9kV6VH9F92
(Don't feel obligated)