diffusers
diffusers copied to clipboard
More thorough guidance for multiple IP adapter images/masks and a single IP Adapter
Describe the bug
I'm trying to use a single IP adapter with multiple IP adapter images and masks. This section of the docs gives an example of how I could do that: https://huggingface.co/docs/diffusers/v0.29.0/en/using-diffusers/ip_adapter#ip-adapter-masking
The docs provide the following code:
from diffusers.image_processor import IPAdapterMaskProcessor
mask1 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_mask1.png")
mask2 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_mask2.png")
output_height = 1024
output_width = 1024
processor = IPAdapterMaskProcessor()
masks = processor.preprocess([mask1, mask2], height=output_height, width=output_width)
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name=["ip-adapter-plus-face_sdxl_vit-h.safetensors"])
pipeline.set_ip_adapter_scale([[0.7, 0.7]]) # one scale for each image-mask pair
face_image1 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl1.png")
face_image2 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl2.png")
ip_images = [[face_image1, face_image2]]
masks = [masks.reshape(1, masks.shape[0], masks.shape[2], masks.shape[3])]
generator = torch.Generator(device="cpu").manual_seed(0)
num_images = 1
image = pipeline(
prompt="2 girls",
ip_adapter_image=ip_images,
negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
num_inference_steps=20,
num_images_per_prompt=num_images,
generator=generator,
cross_attention_kwargs={"ip_adapter_masks": masks}
).images[0]
One important point that should be highlighted is that images/scales/masks must be lists of lists , otherwise we get the following error: Cannot assign 2 scale_configs to 1 IP-Adapter
.
That error message is intuitive enough, however this gets confusing in other sections of the documentation, such as the set_ip_adapter_scale()
function:
# To use original IP-Adapter
scale = 1.0
pipeline.set_ip_adapter_scale(scale)
# To use style block only
scale = {
"up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)
# To use style+layout blocks
scale = {
"down": {"block_2": [0.0, 1.0]},
"up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale(scale)
# To use style and layout from 2 reference images
scales = [{"down": {"block_2": [0.0, 1.0]}}, {"up": {"block_0": [0.0, 1.0, 0.0]}}]
pipeline.set_ip_adapter_scale(scales)
Is it possible to use the style and layout from 2 reference images with a single IP Adapter? I tried doing something like the following, which builds on the knowledge of needing to use a list of lists:
# List of lists to support multiple images/scales/masks with a single IP Adapter
scales = [[{"down": {"block_2": [0.0, 1.0]}}, {"up": {"block_0": [0.0, 1.0, 0.0]}}]]
pipeline.set_ip_adapter_scale(scales)
# OR
# Use layout and style from InstantStyle for one image, but also use a numerical scale value for the other
scale = {
"down": {"block_2": [0.0, 1.0]},
"up": {"block_0": [0.0, 1.0, 0.0]},
}
pipeline.set_ip_adapter_scale([[0.5, scale]])
but I get the following error:
TypeError: unsupported operand type(s) for *: 'dict' and 'Tensor'\n
At:
/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py(2725): __call__
/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py(549): forward
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1527): _call_impl
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl\n /usr/local/lib/python3.10/dist-packages/diffusers/models/attention.py(366): forward\n /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1527): _call_impl\n /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl\n /usr/local/lib/python3.10/dist-packages/diffusers/models/transformers/transformer_2d.py(440): forward\n /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1527): _call_impl\n /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl\n /usr/local/lib/python3.10/dist-packages/diffusers/models/unets/unet_2d_blocks.py(1288): forward\n /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1527): _call_impl\n /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl\n /usr/local/lib/python3.10/dist-packages/diffusers/models/unets/unet_2d_condition.py(1220): forward\n /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1527): _call_impl\n /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl\n /usr/local/lib/python3.10/dist-packages/diffusers/pipelines/controlnet/pipeline_controlnet_sd_xl.py(1510): __call__\n /usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py(115): decorate_context
Reproduction
- Load single IP Adapter into pipeline
- Use two IP adapter images, two masks, two scales
- Try to use InstantStyle config to set IP Adapter scale
from diffusers import AutoPipelineForText2Image
from diffusers.utils import load_image
import torch
import PIL
# Subject/Foreground Style/Mask
subject_style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")
subject_mask = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_mask1.png")
# Background Style/Mask
background_style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_diner.png")
background_mask = PIL.ImageOps.invert(subject_mask)
# Load pipeline + IP Adapter
pipeline = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="sdxl_models", weight_name="ip-adapter_sdxl.bin")
generator = torch.Generator(device="cpu").manual_seed(26)
# Structure of subject, style of background
layout = {"down": {"block_2": [0.0, 1.0]}}
style = {"up": {"block_0": [0.0, 1.0, 0.0]}}
pipeline.set_ip_adapter_scale([[layout, style]])
# Preprocess mask images
processor = IPAdapterMaskProcessor()
ip_adapter_masks = processor.preprocess([subject_mask, background_mask]).cuda() # Might need to set width/height here
ip_adapter_masks = [
ip_adapter_masks.reshape(
1, ip_adapter_masks.shape[0], ip_adapter_masks.shape[2], ip_adapter_masks.shape[3]
)
]
ip_adapter_images = [[subject_style_image, background_style_image]]
image = pipeline(
prompt="a cat, masterpiece, best quality, high quality",
ip_adapter_image=ip_adapter_images,
negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
guidance_scale=5,
num_inference_steps=30,
generator=generator,
cross_attention_kwargs={"ip_adapter_masks": ip_adapter_masks}
).images[0]
Logs
No response
System Info
-
diffusers
version: 0.27.2 - Platform: Linux-6.5.0-1020-gcp-x86_64-with-glibc2.35
- Python version: 3.10.1
- PyTorch version (GPU?): 2.1.2+cu121 (True)
- Huggingface_hub version: 0.21.1
- Transformers version: 4.39.2
- Accelerate version: 0.28.0
- xFormers version: 0.0.23.post1
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
Who can help?
@sayakpaul @yiyixuxu @sayakpaul