ControlNet
ControlNet copied to clipboard
Colored Scribble input
Hey,
Thanks for your nice work. Scrilbble works really well for generation. But there are a lot of scenarios where we want the input image to contain colors and strokes and we want sd to process it. For example this image.
It has scribble and the color as well, and the model should take input the color and the the sketch as well, but current implementation does not consider the color and only consider sketches and hence input color is not being considered.
Example is above.
If you can train a model that takes the colored scribbled and produces results according to it would be perfect.
Thanks
As a tip, you can use img2img with controlnet is more or less what you want. get the scribbles from the boundaries, and the colors from the img2img input. at denoising strength 0.8 or so color hints survive, but structure is lost in regular img2img, but the controlnet preserves the structure.
@DiceOwl how would you go about using img2img with control net? At the moment, the base model that they use is text2img
https://github.com/Mikubill/sd-webui-controlnet, an extension for Automatic1111, directly allows for img2img. More generally, stable diffusion is intrinsically img2img. text2img just replaces the input image by pure noise. So if you are the hacking kind, you could probably hack rudimentary img2img support into this repo with just a handful of lines of code. You just have to replace the pure noise with the right kind of mixture between noise and (diffused) input image, and adjust the denoising parameters to not start from pure noise.
I'm also interested in this question, have you solved it please and look forward to your reply! @lazy-nurd
https://github.com/Mikubill/sd-webui-controlnet, an extension for Automatic1111, directly allows for img2img. More generally, stable diffusion is intrinsically img2img. text2img just replaces the input image by pure noise. So if you are the hacking kind, you could probably hack rudimentary img2img support into this repo with just a handful of lines of code. You just have to replace the pure noise with the right kind of mixture between noise and (diffused) input image, and adjust the denoising parameters to not start from pure noise.
When I try to do this during denoising, the resulting image tends to be more blurred, do you know why and how to solve it?