StableDiffusionXLControlNetInpaintPipeline unable to use padding_mask_crop with multiple controlnets
Describe the bug
padding_mask_crop works for no controlnet and with 1 controlnet, but when we have multiple controlnets, library returns - ValueError: The image should be a PIL image when inpainting mask crop, but is of type <class 'list'>. have double confirmed my other inputs are as required
ValueError Traceback (most recent call last) Cell In[21], line 6 4 control_images = [control_image, control_image] 5 mask_image = preprocess_image(Image.open(r"./inpainting_img/Inpainting01_InputImageMask.jpg")) ----> 6 images = pipe( 7 prompt, 8 image=init_image, 9 mask_image = mask_image, 10 guidance_scale = 0.7, 11 control_image=control_images, 12 num_inference_steps=30, 13 num_images_per_prompt = 2, 14 padding_mask_crop = 60, 15 generator=generator, 16 cross_attention_kwargs={"scale":0.8}, device="cuda", 17 controlnet_conditioning_scale=[0.4,0.4], 18 control_guidance_start = [0.2,0.2], 19 control_guidance_end = [0.8,0.8] 20 21 ).images
File c:\Users\SEED360\Desktop\Stable_Diffusion\diffusers.venv\lib\site-packages\torch\utils_contextlib.py:115, in context_decorator.
File c:\Users\SEED360\Desktop\Stable_Diffusion\diffusers.venv\lib\site-packages\diffusers\pipelines\controlnet\pipeline_controlnet_inpaint_sd_xl.py:1418, in StableDiffusionXLControlNetInpaintPipeline.call(self, prompt, prompt_2, image, mask_image, control_image, height, width, padding_mask_crop, strength, num_inference_steps, denoising_start, denoising_end, guidance_scale, negative_prompt, negative_prompt_2, num_images_per_prompt, eta, generator, latents, prompt_embeds, negative_prompt_embeds, ip_adapter_image, ip_adapter_image_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds, output_type, return_dict, cross_attention_kwargs, controlnet_conditioning_scale, guess_mode, control_guidance_start, control_guidance_end, guidance_rescale, original_size, crops_coords_top_left, target_size, aesthetic_score, negative_aesthetic_score, clip_skip, callback_on_step_end, callback_on_step_end_tensor_inputs, **kwargs) 1412 control_guidance_start, control_guidance_end = ( 1413 mult * [control_guidance_start], 1414 mult * [control_guidance_end], 1415 ) 1417 # 1. Check inputs -> 1418 self.check_inputs( 1419 prompt, 1420 prompt_2, 1421 control_image, 1422 mask_image, 1423 strength, 1424 num_inference_steps, 1425 callback_steps, 1426 output_type, 1427 negative_prompt, 1428 negative_prompt_2, 1429 prompt_embeds, 1430 negative_prompt_embeds, 1431 ip_adapter_image, 1432 ip_adapter_image_embeds, 1433 pooled_prompt_embeds, 1434 negative_pooled_prompt_embeds, 1435 controlnet_conditioning_scale, 1436 control_guidance_start, 1437 control_guidance_end, 1438 callback_on_step_end_tensor_inputs, 1439 padding_mask_crop, 1440 ) 1442 self._guidance_scale = guidance_scale 1443 self._clip_skip = clip_skip
File c:\Users\SEED360\Desktop\Stable_Diffusion\diffusers.venv\lib\site-packages\diffusers\pipelines\controlnet\pipeline_controlnet_inpaint_sd_xl.py:731, in StableDiffusionXLControlNetInpaintPipeline.check_inputs(self, prompt, prompt_2, image, mask_image, strength, num_inference_steps, callback_steps, output_type, negative_prompt, negative_prompt_2, prompt_embeds, negative_prompt_embeds, ip_adapter_image, ip_adapter_image_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds, controlnet_conditioning_scale, control_guidance_start, control_guidance_end, callback_on_step_end_tensor_inputs, padding_mask_crop) 729 if padding_mask_crop is not None: 730 if not isinstance(image, PIL.Image.Image): --> 731 raise ValueError( 732 f"The image should be a PIL image when inpainting mask crop, but is of type" f" {type(image)}." 733 ) 734 if not isinstance(mask_image, PIL.Image.Image): 735 raise ValueError( 736 f"The mask image should be a PIL image when inpainting mask crop, but is of type" 737 f" {type(mask_image)}." 738 )
ValueError: The image should be a PIL image when inpainting mask crop, but is of type <class 'list'>.
Reproduction
from diffusers import StableDiffusionXLControlNetInpaintPipeline, ControlNetModel, AutoencoderKL, UniPCMultistepScheduler import torch
from diffusers.utils import load_image, make_image_grid from PIL import Image import cv2 import numpy as np import torch from torchvision import transforms
#convert image to tensor def preprocess_image(image,resize = (1024,1024)):
#Encode
# Define transformation: Convert the image to a tensor and normalize it
preprocess = transforms.Compose([
transforms.Resize(resize), # Resize image to the model's expected input size
transforms.ToTensor(), # Convert image to PyTorch tensor
transforms.Lambda(lambda x: x[:3, :, :]) ,
#transforms.Normalize([0.5], [0.5]), # Normalize to [-1, 1] range
transforms.ToPILImage()
])
output_image = preprocess(image)
#get rid of the extra alpha channel. from rgba to rgb
return output_image
init_image = preprocess_image(Image.open(r"./inpainting_img/Inpainting01_InputImage.png")) def process_controlnet_image(image): image = np.array(image)
low_threshold = 100
high_threshold = 200
image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)
return canny_image
control_image = preprocess_image(Image.open(r"inpainting_img\Inpainting01_InputImageControlnet.png"))
process_controlnet_image(control_image).show()
controlnet1 = ControlNetModel.from_pretrained("controlnet\controlnet-canny-sdxl-1.0", torch_dtype=torch.float16, use_safetensors=True, local_files_only = True) controlnet2 = ControlNetModel.from_pretrained("controlnet\controlnet-depth-sdxl-1.0", torch_dtype=torch.float16, use_safetensors=True, variant="fp16", local_files_only = True) controlnets = [controlnet1,controlnet2] model = "hugginface_epicrealism"
pipe = StableDiffusionXLControlNetInpaintPipeline.from_pretrained( model , controlnet=controlnets, torch_dtype=torch.float16, use_safetensors=True).to("cuda")
pipe.enable_model_cpu_offload()
prompt ="a happy dog"
generator = torch.Generator("cuda").manual_seed(31)
prompt ="luxurious mixed use development in the day" init_image = preprocess_image(Image.open(r"./inpainting_img/Inpainting01_InputImage.png")) control_image = process_controlnet_image(preprocess_image(Image.open(r"inpainting_img\Inpainting01_InputImageControlnet.png"))) control_images = [control_image, control_image] mask_image = preprocess_image(Image.open(r"./inpainting_img/Inpainting01_InputImageMask.jpg")) images = pipe( prompt, image=init_image, mask_image = mask_image, guidance_scale = 0.7, control_image=control_images, num_inference_steps=30, num_images_per_prompt = 2, padding_mask_crop = 60, generator=generator, cross_attention_kwargs={"scale":0.8}, device="cuda", controlnet_conditioning_scale=[0.4,0.4], control_guidance_start = [0.2,0.2], control_guidance_end = [0.8,0.8]
).images
Logs
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[21], line 6
4 control_images = [control_image, control_image]
5 mask_image = preprocess_image(Image.open(r"./inpainting_img/Inpainting01_InputImageMask.jpg"))
----> 6 images = pipe(
7 prompt,
8 image=init_image,
9 mask_image = mask_image,
10 guidance_scale = 0.7,
11 control_image=control_images,
12 num_inference_steps=30,
13 num_images_per_prompt = 2,
14 padding_mask_crop = 60,
15 generator=generator,
16 cross_attention_kwargs={"scale":0.8}, device="cuda",
17 controlnet_conditioning_scale=[0.4,0.4],
18 control_guidance_start = [0.2,0.2],
19 control_guidance_end = [0.8,0.8]
20
21 ).images
File c:\Users\SEED360\Desktop\Stable_Diffusion\diffusers\.venv\lib\site-packages\torch\utils\_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)
File c:\Users\SEED360\Desktop\Stable_Diffusion\diffusers\.venv\lib\site-packages\diffusers\pipelines\controlnet\pipeline_controlnet_inpaint_sd_xl.py:1418, in StableDiffusionXLControlNetInpaintPipeline.__call__(self, prompt, prompt_2, image, mask_image, control_image, height, width, padding_mask_crop, strength, num_inference_steps, denoising_start, denoising_end, guidance_scale, negative_prompt, negative_prompt_2, num_images_per_prompt, eta, generator, latents, prompt_embeds, negative_prompt_embeds, ip_adapter_image, ip_adapter_image_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds, output_type, return_dict, cross_attention_kwargs, controlnet_conditioning_scale, guess_mode, control_guidance_start, control_guidance_end, guidance_rescale, original_size, crops_coords_top_left, target_size, aesthetic_score, negative_aesthetic_score, clip_skip, callback_on_step_end, callback_on_step_end_tensor_inputs, **kwargs)
1412 control_guidance_start, control_guidance_end = (
1413 mult * [control_guidance_start],
1414 mult * [control_guidance_end],
1415 )
1417 # 1. Check inputs
-> 1418 self.check_inputs(
1419 prompt,
1420 prompt_2,
1421 control_image,
1422 mask_image,
1423 strength,
1424 num_inference_steps,
1425 callback_steps,
1426 output_type,
1427 negative_prompt,
1428 negative_prompt_2,
1429 prompt_embeds,
1430 negative_prompt_embeds,
1431 ip_adapter_image,
1432 ip_adapter_image_embeds,
1433 pooled_prompt_embeds,
1434 negative_pooled_prompt_embeds,
1435 controlnet_conditioning_scale,
1436 control_guidance_start,
1437 control_guidance_end,
1438 callback_on_step_end_tensor_inputs,
1439 padding_mask_crop,
1440 )
1442 self._guidance_scale = guidance_scale
1443 self._clip_skip = clip_skip
File c:\Users\SEED360\Desktop\Stable_Diffusion\diffusers\.venv\lib\site-packages\diffusers\pipelines\controlnet\pipeline_controlnet_inpaint_sd_xl.py:731, in StableDiffusionXLControlNetInpaintPipeline.check_inputs(self, prompt, prompt_2, image, mask_image, strength, num_inference_steps, callback_steps, output_type, negative_prompt, negative_prompt_2, prompt_embeds, negative_prompt_embeds, ip_adapter_image, ip_adapter_image_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds, controlnet_conditioning_scale, control_guidance_start, control_guidance_end, callback_on_step_end_tensor_inputs, padding_mask_crop)
729 if padding_mask_crop is not None:
730 if not isinstance(image, PIL.Image.Image):
--> 731 raise ValueError(
732 f"The image should be a PIL image when inpainting mask crop, but is of type" f" {type(image)}."
733 )
734 if not isinstance(mask_image, PIL.Image.Image):
735 raise ValueError(
736 f"The mask image should be a PIL image when inpainting mask crop, but is of type"
737 f" {type(mask_image)}."
738 )
ValueError: The image should be a PIL image when inpainting mask crop, but is of type <class 'list'>.
System Info
- 🤗 Diffusers version: 0.32.1
- Platform: Windows-10-10.0.22631-SP0
- Running on Google Colab?: No
- Python version: 3.10.9
- PyTorch version (GPU?): 2.3.0+cu118 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.27.0
- Transformers version: 4.44.2
- Accelerate version: 0.34.2
- PEFT version: 0.14.0
- Bitsandbytes version: not installed
- Safetensors version: 0.4.5
- xFormers version: not installed
- Accelerator: NVIDIA RTX A6000, 49140 MiB
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
Who can help?
@sayakpaul @yiyixuxu @DN6
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi @teoyangrui , if this is still an issue, can you please post a minimal reproducible code snippet with the correct formatting, right now it's really hard to understand all that you posted because it has a lot of mixed stuff in it.
Also to me it seems that the problem is not with the multiple controlnets as the title says but with the num_images_per_prompt, so can you test it with just num_images_per_prompt=1
I know it has been quite a while since you posted this issue so maybe it's resolved by now, let us know if you still have this issue.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I'm getting the same problem, and it doesn't seem to be exclusive to the XL pipeline.