diffusers [SD Img2Img] resize source images to multiple of 8 instead of 32

Since https://github.com/huggingface/diffusers/pull/505 is merged, the resolution requirements for img2img are relaxed and could be a multiple of 8. Sample code:

import requests
import torch
from PIL import Image
from io import BytesIO

from diffusers import StableDiffusionImg2ImgPipeline

device = "cuda"
model_id_or_path = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    model_id_or_path,
    revision="fp16",
    torch_dtype=torch.float16,
)
pipe = pipe.to(device)

# let's download an initial image
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 504))  # notice that 504 is not divisible by 32

prompt = "A fantasy landscape, trending on artstation"
generator = torch.Generator(device).manual_seed(42)
image = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5, generator=generator).images[0]
print(image.width, image.height)
image.show()

The result before the fix is resized down to 768*480: old

The result after the fix preserves the original 768*504 resolution: new

This change doesn't break the tests but could hurt some reproducibility as the latents' shape is different now.

Dec 06 '22 14:12 vvsotnikov

The documentation is not available anymore as the PR was closed or merged.

Dec 06 '22 14:12 HuggingFaceDocBuilderDev

check_repository_consistency failed so I added this fix to AltDiffusion as well.

Dec 06 '22 14:12 vvsotnikov

Sure! Should I make an fp16 version of the test as well, or fp32 only would be enough?

Dec 07 '22 10:12 vvsotnikov

ONNX img2img pipeline is actually failing when I try to use image that is divisible by 8 but not 16 or 32:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /mnt/c/Users/vladimir/AppData/Local/JetBrains/Toolbox/apps/PyCharm-P/ch-0/22 │
│ 3.7571.203/plugins/python/helpers/pydev/pydevconsole.py:364 in runcode       │
│                                                                              │
│   361 │   │   def runcode(self, code):                                       │
│   362 │   │   │   try:                                                       │
│   363 │   │   │   │   func = types.FunctionType(code, self.locals)           │
│ ❱ 364 │   │   │   │   coro = func()                                          │
│   365 │   │   │   │   if inspect.iscoroutine(coro):                          │
│   366 │   │   │   │   │   loop = asyncio.get_event_loop()                    │
│   367 │   │   │   │   │   loop.run_until_complete(coro)                      │
│ <input>:39 in <module>                                                       │
│                                                                              │
│ /mnt/c/Users/vladimir/PycharmProjects/diffusers/src/diffusers/pipelines/stab │
│ le_diffusion/pipeline_onnx_stable_diffusion_img2img.py:408 in __call__       │
│                                                                              │
│   405 │   │   │                                                              │
│   406 │   │   │   # predict the noise residual                               │
│   407 │   │   │   timestep = np.array([t], dtype=timestep_dtype)             │
│ ❱ 408 │   │   │   noise_pred = self.unet(                                    │
│   409 │   │   │   │   sample=latent_model_input, timestep=timestep, encoder_ │
│   410 │   │   │   )[0]                                                       │
│   411                                                                        │
│                                                                              │
│ /mnt/c/Users/vladimir/PycharmProjects/diffusers/src/diffusers/onnx_utils.py: │
│ 61 in __call__                                                               │
│                                                                              │
│    58 │                                                                      │
│    59 │   def __call__(self, **kwargs):                                      │
│    60 │   │   inputs = {k: np.array(v) for k, v in kwargs.items()}           │
│ ❱  61 │   │   return self.model.run(None, inputs)                            │
│    62 │                                                                      │
│    63 │   @staticmethod                                                      │
│    64 │   def load_model(path: Union[str, Path], provider=None, sess_options │
│                                                                              │
│ /home/vladimir/.virtualenvs/diffusers/lib/python3.10/site-packages/onnxrunti │
│ me/capi/onnxruntime_inference_collection.py:200 in run                       │
│                                                                              │
│   197 │   │   if not output_names:                                           │
│   198 │   │   │   output_names = [output.name for output in self._outputs_me │
│   199 │   │   try:                                                           │
│ ❱ 200 │   │   │   return self._sess.run(output_names, input_feed, run_option │
│   201 │   │   except C.EPFail as err:                                        │
│   202 │   │   │   if self._enable_fallback:                                  │
│   203 │   │   │   │   print("EP Error: {} using {}".format(str(err), self._p │
╰──────────────────────────────────────────────────────────────────────────────╯
Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while 
running Concat node. Name:'Concat_3588' Status Message: concat.cc:156 
PrepareForCompute Non concat axis dimensions must match: Axis 2 has mismatched 
dimensions of 5 and 6

I'm not really familiar with ONNX but I'll try to investigate.

Dec 07 '22 11:12 vvsotnikov

Actually, it looks like ONNX pipeline can't even work with resolutions that are multiples of 32, only 64 are supported. This code uses the init image that is a multiple of 32 but still throws an error that is similar to the one that I've shared in the previous message:

import numpy as np
import onnxruntime as ort

from diffusers import OnnxStableDiffusionImg2ImgPipeline
from diffusers.utils import load_image

gpu_provider = (
    "CUDAExecutionProvider",
    {
        "gpu_mem_limit": "15000000000",  # 15GB
        "arena_extend_strategy": "kSameAsRequested",
    },
)
gpu_options = ort.SessionOptions()
gpu_options.enable_mem_pattern = False

init_image = load_image(
    "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main"
    "/img2img/sketch-mountains-input.jpg"
)
init_image = init_image.resize((512 - 32, 512))  # multiple of 32 but not 64
pipe = OnnxStableDiffusionImg2ImgPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    revision="onnx",
    provider=gpu_provider,
    sess_options=gpu_options,
)
pipe.set_progress_bar_config(disable=None)

prompt = "A fantasy landscape, trending on artstation"

generator = np.random.RandomState(0)
output = pipe(
    prompt=prompt,
    image=init_image,
    strength=0.75,
    guidance_scale=7.5,
    num_inference_steps=10,
    generator=generator,
    output_type="np",
)

This applies to text2image too. Could it be related to the way that the model is being exported to ONNX format? torch.onnx.export() docs are saying that it doesn't preserve dynamic control flow when being exported from torch.nn.Module (which is the case for scripts/convert_stable_diffusion_checkpoint_to_onnx.py.

Dec 07 '22 14:12 vvsotnikov

Could we for now apply this fix only to StableDiffusionImg2ImgPipeline and AltDiffusionImg2ImgPipeline, and keep the ONNX pipeline intact? :) Given that there's already a discrepancy in text2img between these three (the first two could generate a 504x504 image but the ONNX pipeline couldn't), I don't think having a similar discrepancy in img2img would be a problem. Moreover, it's probably to scale init images to a multiple of 64 when feeding it to the ONNX pipeline instead of 32 as the current implementation could throw errors like the one that I've posted above.

Dec 07 '22 14:12 vvsotnikov

Totally fine to not add the changes to ONNX! Just could we please add one test that shows how to do img2img wit ha multiple of 8?

Dec 11 '22 16:12 patrickvonplaten

Sure, will do later this week :)

Dec 14 '22 18:12 vvsotnikov

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jan 08 '23 15:01 github-actions[bot]

Gently ping @vvsotnikov , happy to assign the PR to myself if you're busy :-)

Jan 12 '23 19:01 patrickvonplaten

@patrickvonplaten sorry for the delay, and thanks for reminding :) I'd be glad to finish this PR today or tomorrow, although it seems like I don't have permissions to reassign this back to myself 🤔

Jan 12 '23 20:01 vvsotnikov

@patrickvonplaten I've added the tests, however, check_repository_consistency is failing because, unlike the rest, the ONNX Img2Img pipeline can't work with multiplies of 8, only 64. What do you think I should do about that? :)

Also unsure why paint-by-example tests are failing - I haven't changed anything related to this pipeline, and these tests are green when I run them locally.

Jan 12 '23 22:01 vvsotnikov

cc @anton-l for ONNX.

Hmm quite surprised that the PaintByExample tests are failing here as those pipelines aren't touched.

Fixed ONNX for now by adding " with 8->64", think that's fine

Jan 13 '23 13:01 patrickvonplaten

Ok tests are now all passing, not sure what was going on there. Also couldn't reproduce test failures locally -> merging!

Thanks a lot for the PR @vvsotnikov :heart:

This should be very useful for the community!

Jan 13 '23 15:01 patrickvonplaten

Glad to help!

Jan 13 '23 15:01 vvsotnikov

diffusers diffusers copied to clipboard

[SD Img2Img] resize source images to multiple of 8 instead of 32

diffusers
diffusers copied to clipboard