ComfyUI flux controlnet report an error

Expected Behavior

img2img

Actual Behavior

error

Steps to Reproduce

屏幕截图 2024-08-13 181721

Debug Logs

!!! Exception during processing!!!  Error while processing rearrange-reduction pattern "b c (h ph) (w pw) -> b (h w) (c ph pw)".
 Input tensor shape: torch.Size([1, 16, 225, 150]). Additional info: {'ph': 2, 'pw': 2}.
 Shape mismatch, can't divide axis of length 225 in chunks of 2
Traceback (most recent call last):
  File "E:\comfyui\python_embeded\lib\site-packages\einops\einops.py", line 523, in reduce
    return _apply_recipe(
  File "E:\comfyui\python_embeded\lib\site-packages\einops\einops.py", line 234, in _apply_recipe
    init_shapes, axes_reordering, reduced_axes, added_axes, final_shapes, n_axes_w_added = _reconstruct_from_shape(
  File "E:\comfyui\python_embeded\lib\site-packages\einops\einops.py", line 187, in _reconstruct_from_shape_uncached
    raise EinopsError(f"Shape mismatch, can't divide axis of length {length} in chunks of {known_product}")
einops.EinopsError: Shape mismatch, can't divide axis of length 225 in chunks of 2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\comfyui\ComfyUI\execution.py", line 152, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "E:\comfyui\ComfyUI\execution.py", line 82, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "E:\comfyui\ComfyUI\execution.py", line 75, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "E:\comfyui\comfyui\comfy_extras\nodes_custom_sampler.py", line 612, in sample
    samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed)
  File "E:\comfyui\ComfyUI\comfy\samplers.py", line 716, in sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "E:\comfyui\ComfyUI\comfy\samplers.py", line 695, in inner_sample
    samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
  File "E:\comfyui\ComfyUI\comfy\samplers.py", line 600, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
  File "E:\comfyui\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "E:\comfyui\ComfyUI\comfy\k_diffusion\sampling.py", line 143, in sample_euler
    denoised = model(x, sigma_hat * s_in, **extra_args)
  File "E:\comfyui\ComfyUI\comfy\samplers.py", line 299, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
  File "E:\comfyui\ComfyUI\comfy\samplers.py", line 682, in __call__
    return self.predict_noise(*args, **kwargs)
  File "E:\comfyui\ComfyUI\comfy\samplers.py", line 685, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
  File "E:\comfyui\ComfyUI\comfy\samplers.py", line 279, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
  File "E:\comfyui\ComfyUI\comfy\samplers.py", line 202, in calc_cond_batch
    c['control'] = control.get_control(input_x, timestep_, c, len(cond_or_uncond))
  File "E:\comfyui\ComfyUI\comfy\controlnet.py", line 238, in get_control
    control = self.control_model(x=x_noisy.to(dtype), hint=self.cond_hint, timesteps=timestep.to(dtype), context=context.to(dtype), **extra)
  File "E:\comfyui\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "E:\comfyui\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\comfyui\ComfyUI\comfy\ldm\flux\controlnet_xlabs.py", line 104, in forward
    return self.forward_orig(img, img_ids, hint, context, txt_ids, timesteps, y, guidance)
  File "E:\comfyui\ComfyUI\comfy\ldm\flux\controlnet_xlabs.py", line 62, in forward_orig
    controlnet_cond = rearrange(controlnet_cond, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2)
  File "E:\comfyui\python_embeded\lib\site-packages\einops\einops.py", line 591, in rearrange
    return reduce(tensor, pattern, reduction="rearrange", **axes_lengths)
  File "E:\comfyui\python_embeded\lib\site-packages\einops\einops.py", line 533, in reduce
    raise EinopsError(message + "\n {}".format(e))
einops.EinopsError:  Error while processing rearrange-reduction pattern "b c (h ph) (w pw) -> b (h w) (c ph pw)".
 Input tensor shape: torch.Size([1, 16, 225, 150]). Additional info: {'ph': 2, 'pw': 2}.
 Shape mismatch, can't divide axis of length 225 in chunks of 2

Other

No response

Aug 13 '24 10:08 NeilWang079

You are using image to image and controlnet together which is not the way it is intended, create an empty latent image instead to connect it into the samples and you should be good to go.

Aug 14 '24 00:08 Ling-APE

Of course you can do Img2Img and ControlNet in FLUX. Just resize the input image's longer axis accordingly. If you have an image if 1544 x 1544 for example: It does not work because 1544 / 8 = 193 and 193 is not divisible by 2 without a remainder. Them you get the error: torch.Size([1, 16, 193, 193]). Additional info: {'ph': 2, 'pw': 2}. Shape mismatch, can't divide axis of length 193 in chunks of 2

If you resize your input image to 1520 x 1520 however then you get 1520 / 8 = 190. 190 is divisible by 2. Then it works

Oct 07 '24 18:10 entropy010

I found an alternate solution, simply pad the tensor if its it's not divisible by 2 without a remainder.

Update ComfyUI\comfy\ldm\flux\controlnet.py

Replace the "forward" function with this. I didn't test this on all resolution but I believe it should work. It allowed me to do 1920x1080 resolution.

    def forward(self, x, timesteps, context, y, guidance=None, hint=None, **kwargs):

        patch_size = 2
        
        def pad_if_needed(tensor):
            _, _, h, w = tensor.shape
            pad_h = (patch_size - (h % patch_size)) % patch_size
            pad_w = (patch_size - (w % patch_size)) % patch_size
            if pad_h > 0 or pad_w > 0:
                return torch.nn.functional.pad(tensor, (0, pad_w, 0, pad_h))
            return tensor
    
        if self.latent_input:
            hint = comfy.ldm.common_dit.pad_to_patch_size(hint, (patch_size, patch_size))
        elif self.mistoline:
            hint = hint * 2.0 - 1.0
            hint = self.input_cond_block(hint)
        else:
            hint = hint * 2.0 - 1.0
            hint = self.input_hint_block(hint)
            hint = pad_if_needed(hint)  # Add padding if needed

        hint = rearrange(hint, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size)
   
        bs, c, h, w = x.shape
        x = pad_if_needed(x)  # Add padding if needed
        img = rearrange(x, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size)
        
        h_len = ((h + (patch_size - 1)) // patch_size)
        w_len = ((w + (patch_size - 1)) // patch_size)
        
        img_ids = torch.zeros((h_len, w_len, 3), device=x.device, dtype=x.dtype)
        img_ids[..., 1] = img_ids[..., 1] + torch.linspace(0, h_len - 1, steps=h_len, device=x.device, dtype=x.dtype)[:, None]
        img_ids[..., 2] = img_ids[..., 2] + torch.linspace(0, w_len - 1, steps=w_len, device=x.device, dtype=x.dtype)[None, :]
        img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs)

        txt_ids = torch.zeros((bs, context.shape[1], 3), device=x.device, dtype=x.dtype)
        return self.forward_orig(img, img_ids, hint, context, txt_ids, timesteps, y, guidance, control_type=kwargs.get("control_type", []))

Oct 30 '24 22:10 ghost

I found an alternate solution, simply pad the tensor if its it's not divisible by 2 without a remainder.

Update ComfyUI\comfy\ldm\flux\controlnet.py

Replace the "forward" function with this. I didn't test this on all resolution but I believe it should work. It allowed me to do 1920x1080 resolution.

That's great! Thank you!

Nov 21 '24 21:11 rollingcookies

Resize the image size to a multiple of 16, then it could work.(There may be a slight cropping)

I found an alternate solution, simply pad the tensor if its it's not divisible by 2 without a remainder.我找到了一个替代解决方案，如果张量不能被 2 整除而没有余数，只需填充张量即可。

Update ComfyUI\comfy\ldm\flux\controlnet.py更新 ComfyUI\comfy\ldm\flux\controlnet.py

Replace the "forward" function with this. I didn't test this on all resolution but I believe it should work. It allowed me to do 1920x1080 resolution.将 “forward” 函数替换为 this。我没有在所有分辨率上对此进行测试，但我相信它应该有效。它允许我实现 1920x1080 的分辨率。

    def forward(self, x, timesteps, context, y, guidance=None, hint=None, **kwargs):

        patch_size = 2
        
        def pad_if_needed(tensor):
            _, _, h, w = tensor.shape
            pad_h = (patch_size - (h % patch_size)) % patch_size
            pad_w = (patch_size - (w % patch_size)) % patch_size
            if pad_h > 0 or pad_w > 0:
                return torch.nn.functional.pad(tensor, (0, pad_w, 0, pad_h))
            return tensor
    
        if self.latent_input:
            hint = comfy.ldm.common_dit.pad_to_patch_size(hint, (patch_size, patch_size))
        elif self.mistoline:
            hint = hint * 2.0 - 1.0
            hint = self.input_cond_block(hint)
        else:
            hint = hint * 2.0 - 1.0
            hint = self.input_hint_block(hint)
            hint = pad_if_needed(hint)  # Add padding if needed

        hint = rearrange(hint, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size)
   
        bs, c, h, w = x.shape
        x = pad_if_needed(x)  # Add padding if needed
        img = rearrange(x, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size)
        
        h_len = ((h + (patch_size - 1)) // patch_size)
        w_len = ((w + (patch_size - 1)) // patch_size)
        
        img_ids = torch.zeros((h_len, w_len, 3), device=x.device, dtype=x.dtype)
        img_ids[..., 1] = img_ids[..., 1] + torch.linspace(0, h_len - 1, steps=h_len, device=x.device, dtype=x.dtype)[:, None]
        img_ids[..., 2] = img_ids[..., 2] + torch.linspace(0, w_len - 1, steps=w_len, device=x.device, dtype=x.dtype)[None, :]
        img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs)

        txt_ids = torch.zeros((bs, context.shape[1], 3), device=x.device, dtype=x.dtype)
        return self.forward_orig(img, img_ids, hint, context, txt_ids, timesteps, y, guidance, control_type=kwargs.get("control_type", []))

BTW,this could work ,too.

Dec 31 '24 07:12 AyaseEUmi

Resize the image size to a multiple of 16, then it could work.(There may be a slight cropping)

I found an alternate solution, simply pad the tensor if its it's not divisible by 2 without a remainder.我找到了一个替代解决方案，如果张量不能被 2 整除而没有余数，只需填充张量即可。 Update ComfyUI\comfy\ldm\flux\controlnet.py更新 ComfyUI\comfy\ldm\flux\controlnet.py Replace the "forward" function with this. I didn't test this on all resolution but I believe it should work. It allowed me to do 1920x1080 resolution.将 “forward” 函数替换为 this。我没有在所有分辨率上对此进行测试，但我相信它应该有效。它允许我实现 1920x1080 的分辨率。

    def forward(self, x, timesteps, context, y, guidance=None, hint=None, **kwargs):

        patch_size = 2
        
        def pad_if_needed(tensor):
            _, _, h, w = tensor.shape
            pad_h = (patch_size - (h % patch_size)) % patch_size
            pad_w = (patch_size - (w % patch_size)) % patch_size
            if pad_h > 0 or pad_w > 0:
                return torch.nn.functional.pad(tensor, (0, pad_w, 0, pad_h))
            return tensor
    
        if self.latent_input:
            hint = comfy.ldm.common_dit.pad_to_patch_size(hint, (patch_size, patch_size))
        elif self.mistoline:
            hint = hint * 2.0 - 1.0
            hint = self.input_cond_block(hint)
        else:
            hint = hint * 2.0 - 1.0
            hint = self.input_hint_block(hint)
            hint = pad_if_needed(hint)  # Add padding if needed

        hint = rearrange(hint, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size)
   
        bs, c, h, w = x.shape
        x = pad_if_needed(x)  # Add padding if needed
        img = rearrange(x, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size)
        
        h_len = ((h + (patch_size - 1)) // patch_size)
        w_len = ((w + (patch_size - 1)) // patch_size)
        
        img_ids = torch.zeros((h_len, w_len, 3), device=x.device, dtype=x.dtype)
        img_ids[..., 1] = img_ids[..., 1] + torch.linspace(0, h_len - 1, steps=h_len, device=x.device, dtype=x.dtype)[:, None]
        img_ids[..., 2] = img_ids[..., 2] + torch.linspace(0, w_len - 1, steps=w_len, device=x.device, dtype=x.dtype)[None, :]
        img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs)

        txt_ids = torch.zeros((bs, context.shape[1], 3), device=x.device, dtype=x.dtype)
        return self.forward_orig(img, img_ids, hint, context, txt_ids, timesteps, y, guidance, control_type=kwargs.get("control_type", []))

BTW,this could work ,too.

Your suggestion worked like a charm! Thanks!

Jan 27 '25 15:01 M8bits

I found an alternate solution, simply pad the tensor if its it's not divisible by 2 without a remainder.

Update ComfyUI\comfy\ldm\flux\controlnet.py

Replace the "forward" function with this. I didn't test this on all resolution but I believe it should work. It allowed me to do 1920x1080 resolution.

    def forward(self, x, timesteps, context, y, guidance=None, hint=None, **kwargs):

        patch_size = 2
        
        def pad_if_needed(tensor):
            _, _, h, w = tensor.shape
            pad_h = (patch_size - (h % patch_size)) % patch_size
            pad_w = (patch_size - (w % patch_size)) % patch_size
            if pad_h > 0 or pad_w > 0:
                return torch.nn.functional.pad(tensor, (0, pad_w, 0, pad_h))
            return tensor
    
        if self.latent_input:
            hint = comfy.ldm.common_dit.pad_to_patch_size(hint, (patch_size, patch_size))
        elif self.mistoline:
            hint = hint * 2.0 - 1.0
            hint = self.input_cond_block(hint)
        else:
            hint = hint * 2.0 - 1.0
            hint = self.input_hint_block(hint)
            hint = pad_if_needed(hint)  # Add padding if needed

        hint = rearrange(hint, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size)
   
        bs, c, h, w = x.shape
        x = pad_if_needed(x)  # Add padding if needed
        img = rearrange(x, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size)
        
        h_len = ((h + (patch_size - 1)) // patch_size)
        w_len = ((w + (patch_size - 1)) // patch_size)
        
        img_ids = torch.zeros((h_len, w_len, 3), device=x.device, dtype=x.dtype)
        img_ids[..., 1] = img_ids[..., 1] + torch.linspace(0, h_len - 1, steps=h_len, device=x.device, dtype=x.dtype)[:, None]
        img_ids[..., 2] = img_ids[..., 2] + torch.linspace(0, w_len - 1, steps=w_len, device=x.device, dtype=x.dtype)[None, :]
        img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs)

        txt_ids = torch.zeros((bs, context.shape[1], 3), device=x.device, dtype=x.dtype)
        return self.forward_orig(img, img_ids, hint, context, txt_ids, timesteps, y, guidance, control_type=kwargs.get("control_type", []))

This works well. Thank you very much.

Mar 10 '25 06:03 logan683

ComfyUI ComfyUI copied to clipboard

flux controlnet report an error

Expected Behavior

Actual Behavior

Steps to Reproduce

Debug Logs

Other

ComfyUI
ComfyUI copied to clipboard