ControlNet icon indicating copy to clipboard operation
ControlNet copied to clipboard

[Scribble] How to catch the color in input image?

Open DevJunghun opened this issue 1 year ago • 6 comments

Hi, Thanks for sharing this library for using Image Generation. There is one questions I want to ask.

I want to catch the color in input image. For example.. In this image, image I want to depict character's various colors in output image. (yellow, red, blue, green, etc..) But, output image is not depicted the color.

I use this code in generating output image.

with torch.no_grad():
        img = resize_image(HWC3(input_image), image_resolution)
        H, W, C = img.shape

        detected_map = np.zeros_like(img, dtype=np.uint8)
        detected_map[np.min(img, axis=2) < 127] = 255

        control = torch.from_numpy(detected_map.copy()).float().cuda() / 255.0
        control = torch.stack([control for _ in range(num_samples)], dim=0)
        control = einops.rearrange(control, 'b h w c -> b c h w').clone()

        if seed == -1:
            seed = random.randint(0, 999999999)
        seed_everything(seed)

        if config.save_memory:
            model.low_vram_shift(is_diffusing=False)

        cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([extra_prompt + prompt] * num_samples)]}
        un_cond = {"c_concat": None if guess_mode else [control], "c_crossattn": [model.get_learned_conditioning([negative_prompt] * num_samples)]}
        shape = (4, H // 8, W // 8)

        if config.save_memory:
            model.low_vram_shift(is_diffusing=True)

        model.control_scales = [strength * (0.825 ** float(12 - i)) for i in range(13)] if guess_mode else ([strength] * 13)  # Magic number. IDK why. Perhaps because 0.825**12<0.01 but 0.826**12>0.01
        samples, intermediates = ddim_sampler.sample(ddim_steps, num_samples,
                                                        shape, cond, verbose=False, eta=eta,
                                                        unconditional_guidance_scale=scale,
                                                        unconditional_conditioning=un_cond)
        
        if config.save_memory:
            model.low_vram_shift(is_diffusing=False)

        x_samples = model.decode_first_stage(samples)
        x_samples = (einops.rearrange(x_samples, 'b c h w -> b h w c') * 127.5 + 127.5).cpu().numpy().clip(0, 255).astype(np.uint8)

        results = [x_samples[i] for i in range(num_samples)]

How to catch the color in input image and depict in output image? I think that I will fix this question by modifying the code that detects the boundary. right? I'll be waiting for your good opinions.

Thank you.

DevJunghun avatar Jun 13 '23 07:06 DevJunghun

Maybe you can try multi-control with controlnet_scribble + t2i_color in the webui

fangfchen avatar Jul 07 '23 02:07 fangfchen

@fangfchen Thank you for your comment. But, I'm not using webui. only CLI. I will research about how to use t2i_color in CLI env. Thank you!

DevJunghun avatar Jul 07 '23 03:07 DevJunghun

Hello, I also want to achieve this effect. Do you have any good solutions currently?

lxxie298 avatar Dec 06 '23 14:12 lxxie298

as fangfchen already statet this can be achieved with multi-controlnet scribble + color. diffusers framework has support for multi controlnet. how to implement this with vanilla pytorch I don't know but you can look it up in the CN a1111 extension. another approach is "regional prompting" where you use different prompts for a specific area in the latent space, e.g. upper left corner has "red ear" added in the prompt and and lower left corner has "green leg" added to it.

geroldmeisinger avatar Dec 07 '23 14:12 geroldmeisinger

@lxxie298 I later tried another method, using the color map as input for img2img (with around 0.9 denoising weight), which could also achieve a similar effect. I hope it will be helpful.

fangfchen avatar Dec 10 '23 16:12 fangfchen

@lxxie298 I later tried another method, using the color map as input for img2img (with around 0.9 denoising weight), which could also achieve a similar effect. I hope it will be helpful.

Hey fangfchen, how did this work for you? Were the results promising? I'm worried that a color map would mess pickup the white background and cause issues.

lamper-mit avatar Feb 05 '24 18:02 lamper-mit