diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Unexpected behavior of `image_processor.apply_overlay`

Open cortwave opened this issue 1 year ago • 11 comments

Describe the bug

The apply_overlay method of image_processor works unexpectedly if init_image and image have different shapes. For example it can appear using StableDiffusionXLInpaintPipeline with padding_mask_crop value filled.

According to apply_overlay code, it resizes init_image to the image size and works with all images in these coordinates. And if crop_coordinates are in init_image coordinates (that is expected and intuitive, StableDiffusionXLInpaintPipeline uses exactly this coordinates system) then it inserts image into the wrong place.

Reproduction

This code corresponds to the example from official documentation.

import torch
from diffusers import StableDiffusionXLInpaintPipeline

from diffusers.utils import load_image
import numpy as np
from PIL import Image

init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")

inpainting_mask = np.zeros_like(init_image)
inpainting_mask[210:555, 71:600] = 255
inpainting_map = Image.fromarray(inpainting_mask)


base_model_name: str = "diffusers/stable-diffusion-xl-1.0-inpainting-0.1"
pipe = StableDiffusionXLInpaintPipeline.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")


image = pipe(
    "cat",
    image=init_image,
    mask_image=inpainting_map,
    strength=0.99,
    num_inference_steps=20,
    guidance_scale=8.5,
    padding_mask_crop=16,
).images[0]


image.save("output.png")

Output Image output

As we can see, the crop was pasted into the wrong location such as crop_coordinates used in apply_overlay method are in original image system, but all was resized to the crop image size

Logs

No response

System Info

  • 🤗 Diffusers version: 0.28.2
  • Platform: Ubuntu 22.04.4 LTS - Linux-5.15.0-113-generic-x86_64-with-glibc2.35
  • Running on a notebook?: No
  • Running on Google Colab?: No
  • Python version: 3.10.14
  • PyTorch version (GPU?): 2.1.1+cu121 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.24.0
  • Transformers version: 4.41.2
  • Accelerate version: 0.22.0
  • PEFT version: 0.11.1
  • Bitsandbytes version: not installed
  • Safetensors version: 0.4.3
  • xFormers version: 0.0.23
  • Accelerator: NVIDIA RTX A5000, 24564 MiB VRAM
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: no

Who can help?

No response

cortwave avatar Aug 13 '24 11:08 cortwave