diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Tensor size mismatch for non pow of 2 sized image, SD3ControlNetModel

Open Teriks opened this issue 8 months ago • 11 comments

Describe the bug

There seems to be an issue with certain non power of 2 sized control net guidance images when using SD3ControlNetModel

Reproduction

import diffusers
import PIL.Image
import os

import torch

os.environ['HF_TOKEN'] = 'your token'

cn = diffusers.SD3ControlNetModel.from_pretrained('InstantX/SD3-Controlnet-Canny')

pipe = diffusers.StableDiffusion3ControlNetPipeline.from_pretrained(
    'stabilityai/stable-diffusion-3-medium-diffusers',
     controlnet=cn)

pipe.enable_sequential_cpu_offload()

# aligned by 8, not a power of 2
output_size = (1376, 920)

not_pow_2 = PIL.Image.new('RGB', output_size)

args = {
    'guidance_scale': 8.0,
    'num_inference_steps': 30,
    'width': output_size[0],
    'height': output_size[1],
    'control_image': not_pow_2,
    'prompt': 'test prompt'
}

pipe(**args)

Logs

REDACT\venv\Lib\site-packages\diffusers\models\attention_processor.py:1584: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
  hidden_states = F.scaled_dot_product_attention(
  0%|          | 0/30 [00:49<?, ?it/s]
Traceback (most recent call last):
  File "REDACT\test.py", line 37, in <module>
    pipe(**args)
  File "REDACT\venv\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\diffusers\pipelines\controlnet_sd3\pipeline_stable_diffusion_3_controlnet.py", line 1020, in __call__
    latents = self.scheduler.step(noise_pred, t, latents, return_dict=False)[0]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACT\venv\Lib\site-packages\diffusers\schedulers\scheduling_flow_match_euler_discrete.py", line 268, in step
    denoised = sample - model_output * sigma
               ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (115) must match the size of tensor b (114) at non-singleton dimension 2

System Info

Platform: Windows

Python 3.12.3 diffusers 0.29.1 transformers 4.41.2 accelerate 0.31.0

Who can help?

@sayakpaul @yiyixuxu @DN6

Teriks avatar Jun 23 '24 19:06 Teriks