diffusers
diffusers copied to clipboard
[SD Img2Img] resize source images to multiple of 8 instead of 32
Since https://github.com/huggingface/diffusers/pull/505 is merged, the resolution requirements for img2img are relaxed and could be a multiple of 8. Sample code:
import requests
import torch
from PIL import Image
from io import BytesIO
from diffusers import StableDiffusionImg2ImgPipeline
device = "cuda"
model_id_or_path = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
model_id_or_path,
revision="fp16",
torch_dtype=torch.float16,
)
pipe = pipe.to(device)
# let's download an initial image
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 504)) # notice that 504 is not divisible by 32
prompt = "A fantasy landscape, trending on artstation"
generator = torch.Generator(device).manual_seed(42)
image = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5, generator=generator).images[0]
print(image.width, image.height)
image.show()
The result before the fix is resized down to 768*480:
The result after the fix preserves the original 768*504 resolution:
This change doesn't break the tests but could hurt some reproducibility as the latents' shape is different now.
The documentation is not available anymore as the PR was closed or merged.
check_repository_consistency
failed so I added this fix to AltDiffusion as well.
Sure! Should I make an fp16 version of the test as well, or fp32 only would be enough?
ONNX img2img pipeline is actually failing when I try to use image that is divisible by 8 but not 16 or 32:
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /mnt/c/Users/vladimir/AppData/Local/JetBrains/Toolbox/apps/PyCharm-P/ch-0/22 │
│ 3.7571.203/plugins/python/helpers/pydev/pydevconsole.py:364 in runcode │
│ │
│ 361 │ │ def runcode(self, code): │
│ 362 │ │ │ try: │
│ 363 │ │ │ │ func = types.FunctionType(code, self.locals) │
│ ❱ 364 │ │ │ │ coro = func() │
│ 365 │ │ │ │ if inspect.iscoroutine(coro): │
│ 366 │ │ │ │ │ loop = asyncio.get_event_loop() │
│ 367 │ │ │ │ │ loop.run_until_complete(coro) │
│ <input>:39 in <module> │
│ │
│ /mnt/c/Users/vladimir/PycharmProjects/diffusers/src/diffusers/pipelines/stab │
│ le_diffusion/pipeline_onnx_stable_diffusion_img2img.py:408 in __call__ │
│ │
│ 405 │ │ │ │
│ 406 │ │ │ # predict the noise residual │
│ 407 │ │ │ timestep = np.array([t], dtype=timestep_dtype) │
│ ❱ 408 │ │ │ noise_pred = self.unet( │
│ 409 │ │ │ │ sample=latent_model_input, timestep=timestep, encoder_ │
│ 410 │ │ │ )[0] │
│ 411 │
│ │
│ /mnt/c/Users/vladimir/PycharmProjects/diffusers/src/diffusers/onnx_utils.py: │
│ 61 in __call__ │
│ │
│ 58 │ │
│ 59 │ def __call__(self, **kwargs): │
│ 60 │ │ inputs = {k: np.array(v) for k, v in kwargs.items()} │
│ ❱ 61 │ │ return self.model.run(None, inputs) │
│ 62 │ │
│ 63 │ @staticmethod │
│ 64 │ def load_model(path: Union[str, Path], provider=None, sess_options │
│ │
│ /home/vladimir/.virtualenvs/diffusers/lib/python3.10/site-packages/onnxrunti │
│ me/capi/onnxruntime_inference_collection.py:200 in run │
│ │
│ 197 │ │ if not output_names: │
│ 198 │ │ │ output_names = [output.name for output in self._outputs_me │
│ 199 │ │ try: │
│ ❱ 200 │ │ │ return self._sess.run(output_names, input_feed, run_option │
│ 201 │ │ except C.EPFail as err: │
│ 202 │ │ │ if self._enable_fallback: │
│ 203 │ │ │ │ print("EP Error: {} using {}".format(str(err), self._p │
╰──────────────────────────────────────────────────────────────────────────────╯
Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while
running Concat node. Name:'Concat_3588' Status Message: concat.cc:156
PrepareForCompute Non concat axis dimensions must match: Axis 2 has mismatched
dimensions of 5 and 6
I'm not really familiar with ONNX but I'll try to investigate.
Actually, it looks like ONNX pipeline can't even work with resolutions that are multiples of 32, only 64 are supported. This code uses the init image that is a multiple of 32 but still throws an error that is similar to the one that I've shared in the previous message:
import numpy as np
import onnxruntime as ort
from diffusers import OnnxStableDiffusionImg2ImgPipeline
from diffusers.utils import load_image
gpu_provider = (
"CUDAExecutionProvider",
{
"gpu_mem_limit": "15000000000", # 15GB
"arena_extend_strategy": "kSameAsRequested",
},
)
gpu_options = ort.SessionOptions()
gpu_options.enable_mem_pattern = False
init_image = load_image(
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main"
"/img2img/sketch-mountains-input.jpg"
)
init_image = init_image.resize((512 - 32, 512)) # multiple of 32 but not 64
pipe = OnnxStableDiffusionImg2ImgPipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
revision="onnx",
provider=gpu_provider,
sess_options=gpu_options,
)
pipe.set_progress_bar_config(disable=None)
prompt = "A fantasy landscape, trending on artstation"
generator = np.random.RandomState(0)
output = pipe(
prompt=prompt,
image=init_image,
strength=0.75,
guidance_scale=7.5,
num_inference_steps=10,
generator=generator,
output_type="np",
)
This applies to text2image too. Could it be related to the way that the model is being exported to ONNX format? torch.onnx.export() docs are saying that it doesn't preserve dynamic control flow when being exported from torch.nn.Module
(which is the case for scripts/convert_stable_diffusion_checkpoint_to_onnx.py
.
Could we for now apply this fix only to StableDiffusionImg2ImgPipeline
and AltDiffusionImg2ImgPipeline
, and keep the ONNX pipeline intact? :) Given that there's already a discrepancy in text2img between these three (the first two could generate a 504x504 image but the ONNX pipeline couldn't), I don't think having a similar discrepancy in img2img would be a problem. Moreover, it's probably to scale init images to a multiple of 64 when feeding it to the ONNX pipeline instead of 32 as the current implementation could throw errors like the one that I've posted above.
Totally fine to not add the changes to ONNX! Just could we please add one test that shows how to do img2img wit ha multiple of 8?
Sure, will do later this week :)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Gently ping @vvsotnikov , happy to assign the PR to myself if you're busy :-)
@patrickvonplaten sorry for the delay, and thanks for reminding :) I'd be glad to finish this PR today or tomorrow, although it seems like I don't have permissions to reassign this back to myself 🤔
@patrickvonplaten I've added the tests, however, check_repository_consistency
is failing because, unlike the rest, the ONNX Img2Img pipeline can't work with multiplies of 8, only 64. What do you think I should do about that? :)
Also unsure why paint-by-example tests are failing - I haven't changed anything related to this pipeline, and these tests are green when I run them locally.
cc @anton-l for ONNX.
Hmm quite surprised that the PaintByExample tests are failing here as those pipelines aren't touched.
Fixed ONNX for now by adding " with 8->64", think that's fine
Ok tests are now all passing, not sure what was going on there. Also couldn't reproduce test failures locally -> merging!
Thanks a lot for the PR @vvsotnikov :heart:
This should be very useful for the community!
Glad to help!