ComfyUI
ComfyUI copied to clipboard
flux controlnet report an error
Expected Behavior
img2img
Actual Behavior
error
Steps to Reproduce
Debug Logs
!!! Exception during processing!!! Error while processing rearrange-reduction pattern "b c (h ph) (w pw) -> b (h w) (c ph pw)".
Input tensor shape: torch.Size([1, 16, 225, 150]). Additional info: {'ph': 2, 'pw': 2}.
Shape mismatch, can't divide axis of length 225 in chunks of 2
Traceback (most recent call last):
File "E:\comfyui\python_embeded\lib\site-packages\einops\einops.py", line 523, in reduce
return _apply_recipe(
File "E:\comfyui\python_embeded\lib\site-packages\einops\einops.py", line 234, in _apply_recipe
init_shapes, axes_reordering, reduced_axes, added_axes, final_shapes, n_axes_w_added = _reconstruct_from_shape(
File "E:\comfyui\python_embeded\lib\site-packages\einops\einops.py", line 187, in _reconstruct_from_shape_uncached
raise EinopsError(f"Shape mismatch, can't divide axis of length {length} in chunks of {known_product}")
einops.EinopsError: Shape mismatch, can't divide axis of length 225 in chunks of 2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:\comfyui\ComfyUI\execution.py", line 152, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "E:\comfyui\ComfyUI\execution.py", line 82, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "E:\comfyui\ComfyUI\execution.py", line 75, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "E:\comfyui\comfyui\comfy_extras\nodes_custom_sampler.py", line 612, in sample
samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed)
File "E:\comfyui\ComfyUI\comfy\samplers.py", line 716, in sample
output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
File "E:\comfyui\ComfyUI\comfy\samplers.py", line 695, in inner_sample
samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
File "E:\comfyui\ComfyUI\comfy\samplers.py", line 600, in sample
samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
File "E:\comfyui\python_embeded\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "E:\comfyui\ComfyUI\comfy\k_diffusion\sampling.py", line 143, in sample_euler
denoised = model(x, sigma_hat * s_in, **extra_args)
File "E:\comfyui\ComfyUI\comfy\samplers.py", line 299, in __call__
out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
File "E:\comfyui\ComfyUI\comfy\samplers.py", line 682, in __call__
return self.predict_noise(*args, **kwargs)
File "E:\comfyui\ComfyUI\comfy\samplers.py", line 685, in predict_noise
return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
File "E:\comfyui\ComfyUI\comfy\samplers.py", line 279, in sampling_function
out = calc_cond_batch(model, conds, x, timestep, model_options)
File "E:\comfyui\ComfyUI\comfy\samplers.py", line 202, in calc_cond_batch
c['control'] = control.get_control(input_x, timestep_, c, len(cond_or_uncond))
File "E:\comfyui\ComfyUI\comfy\controlnet.py", line 238, in get_control
control = self.control_model(x=x_noisy.to(dtype), hint=self.cond_hint, timesteps=timestep.to(dtype), context=context.to(dtype), **extra)
File "E:\comfyui\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\comfyui\python_embeded\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "E:\comfyui\ComfyUI\comfy\ldm\flux\controlnet_xlabs.py", line 104, in forward
return self.forward_orig(img, img_ids, hint, context, txt_ids, timesteps, y, guidance)
File "E:\comfyui\ComfyUI\comfy\ldm\flux\controlnet_xlabs.py", line 62, in forward_orig
controlnet_cond = rearrange(controlnet_cond, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2)
File "E:\comfyui\python_embeded\lib\site-packages\einops\einops.py", line 591, in rearrange
return reduce(tensor, pattern, reduction="rearrange", **axes_lengths)
File "E:\comfyui\python_embeded\lib\site-packages\einops\einops.py", line 533, in reduce
raise EinopsError(message + "\n {}".format(e))
einops.EinopsError: Error while processing rearrange-reduction pattern "b c (h ph) (w pw) -> b (h w) (c ph pw)".
Input tensor shape: torch.Size([1, 16, 225, 150]). Additional info: {'ph': 2, 'pw': 2}.
Shape mismatch, can't divide axis of length 225 in chunks of 2
Other
No response
You are using image to image and controlnet together which is not the way it is intended, create an empty latent image instead to connect it into the samples and you should be good to go.
Of course you can do Img2Img and ControlNet in FLUX. Just resize the input image's longer axis accordingly. If you have an image if 1544 x 1544 for example: It does not work because 1544 / 8 = 193 and 193 is not divisible by 2 without a remainder. Them you get the error: torch.Size([1, 16, 193, 193]). Additional info: {'ph': 2, 'pw': 2}. Shape mismatch, can't divide axis of length 193 in chunks of 2
If you resize your input image to 1520 x 1520 however then you get 1520 / 8 = 190. 190 is divisible by 2. Then it works
I found an alternate solution, simply pad the tensor if its it's not divisible by 2 without a remainder.
Update ComfyUI\comfy\ldm\flux\controlnet.py
Replace the "forward" function with this. I didn't test this on all resolution but I believe it should work. It allowed me to do 1920x1080 resolution.
def forward(self, x, timesteps, context, y, guidance=None, hint=None, **kwargs):
patch_size = 2
def pad_if_needed(tensor):
_, _, h, w = tensor.shape
pad_h = (patch_size - (h % patch_size)) % patch_size
pad_w = (patch_size - (w % patch_size)) % patch_size
if pad_h > 0 or pad_w > 0:
return torch.nn.functional.pad(tensor, (0, pad_w, 0, pad_h))
return tensor
if self.latent_input:
hint = comfy.ldm.common_dit.pad_to_patch_size(hint, (patch_size, patch_size))
elif self.mistoline:
hint = hint * 2.0 - 1.0
hint = self.input_cond_block(hint)
else:
hint = hint * 2.0 - 1.0
hint = self.input_hint_block(hint)
hint = pad_if_needed(hint) # Add padding if needed
hint = rearrange(hint, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size)
bs, c, h, w = x.shape
x = pad_if_needed(x) # Add padding if needed
img = rearrange(x, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size)
h_len = ((h + (patch_size - 1)) // patch_size)
w_len = ((w + (patch_size - 1)) // patch_size)
img_ids = torch.zeros((h_len, w_len, 3), device=x.device, dtype=x.dtype)
img_ids[..., 1] = img_ids[..., 1] + torch.linspace(0, h_len - 1, steps=h_len, device=x.device, dtype=x.dtype)[:, None]
img_ids[..., 2] = img_ids[..., 2] + torch.linspace(0, w_len - 1, steps=w_len, device=x.device, dtype=x.dtype)[None, :]
img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs)
txt_ids = torch.zeros((bs, context.shape[1], 3), device=x.device, dtype=x.dtype)
return self.forward_orig(img, img_ids, hint, context, txt_ids, timesteps, y, guidance, control_type=kwargs.get("control_type", []))
I found an alternate solution, simply pad the tensor if its it's not divisible by 2 without a remainder.
Update ComfyUI\comfy\ldm\flux\controlnet.py
Replace the "forward" function with this. I didn't test this on all resolution but I believe it should work. It allowed me to do 1920x1080 resolution.
That's great! Thank you!
Resize the image size to a multiple of 16, then it could work.(There may be a slight cropping)
I found an alternate solution, simply pad the tensor if its it's not divisible by 2 without a remainder.我找到了一个替代解决方案,如果张量不能被 2 整除而没有余数,只需填充张量即可。
Update ComfyUI\comfy\ldm\flux\controlnet.py更新 ComfyUI\comfy\ldm\flux\controlnet.py
Replace the "forward" function with this. I didn't test this on all resolution but I believe it should work. It allowed me to do 1920x1080 resolution.将 “forward” 函数替换为 this。我没有在所有分辨率上对此进行测试,但我相信它应该有效。它允许我实现 1920x1080 的分辨率。
def forward(self, x, timesteps, context, y, guidance=None, hint=None, **kwargs): patch_size = 2 def pad_if_needed(tensor): _, _, h, w = tensor.shape pad_h = (patch_size - (h % patch_size)) % patch_size pad_w = (patch_size - (w % patch_size)) % patch_size if pad_h > 0 or pad_w > 0: return torch.nn.functional.pad(tensor, (0, pad_w, 0, pad_h)) return tensor if self.latent_input: hint = comfy.ldm.common_dit.pad_to_patch_size(hint, (patch_size, patch_size)) elif self.mistoline: hint = hint * 2.0 - 1.0 hint = self.input_cond_block(hint) else: hint = hint * 2.0 - 1.0 hint = self.input_hint_block(hint) hint = pad_if_needed(hint) # Add padding if needed hint = rearrange(hint, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size) bs, c, h, w = x.shape x = pad_if_needed(x) # Add padding if needed img = rearrange(x, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size) h_len = ((h + (patch_size - 1)) // patch_size) w_len = ((w + (patch_size - 1)) // patch_size) img_ids = torch.zeros((h_len, w_len, 3), device=x.device, dtype=x.dtype) img_ids[..., 1] = img_ids[..., 1] + torch.linspace(0, h_len - 1, steps=h_len, device=x.device, dtype=x.dtype)[:, None] img_ids[..., 2] = img_ids[..., 2] + torch.linspace(0, w_len - 1, steps=w_len, device=x.device, dtype=x.dtype)[None, :] img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs) txt_ids = torch.zeros((bs, context.shape[1], 3), device=x.device, dtype=x.dtype) return self.forward_orig(img, img_ids, hint, context, txt_ids, timesteps, y, guidance, control_type=kwargs.get("control_type", []))
BTW,this could work ,too.
Resize the image size to a multiple of 16, then it could work.(There may be a slight cropping)
I found an alternate solution, simply pad the tensor if its it's not divisible by 2 without a remainder.我找到了一个替代解决方案,如果张量不能被 2 整除而没有余数,只需填充张量即可。 Update ComfyUI\comfy\ldm\flux\controlnet.py更新 ComfyUI\comfy\ldm\flux\controlnet.py Replace the "forward" function with this. I didn't test this on all resolution but I believe it should work. It allowed me to do 1920x1080 resolution.将 “forward” 函数替换为 this。我没有在所有分辨率上对此进行测试,但我相信它应该有效。它允许我实现 1920x1080 的分辨率。
def forward(self, x, timesteps, context, y, guidance=None, hint=None, **kwargs): patch_size = 2 def pad_if_needed(tensor): _, _, h, w = tensor.shape pad_h = (patch_size - (h % patch_size)) % patch_size pad_w = (patch_size - (w % patch_size)) % patch_size if pad_h > 0 or pad_w > 0: return torch.nn.functional.pad(tensor, (0, pad_w, 0, pad_h)) return tensor if self.latent_input: hint = comfy.ldm.common_dit.pad_to_patch_size(hint, (patch_size, patch_size)) elif self.mistoline: hint = hint * 2.0 - 1.0 hint = self.input_cond_block(hint) else: hint = hint * 2.0 - 1.0 hint = self.input_hint_block(hint) hint = pad_if_needed(hint) # Add padding if needed hint = rearrange(hint, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size) bs, c, h, w = x.shape x = pad_if_needed(x) # Add padding if needed img = rearrange(x, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size) h_len = ((h + (patch_size - 1)) // patch_size) w_len = ((w + (patch_size - 1)) // patch_size) img_ids = torch.zeros((h_len, w_len, 3), device=x.device, dtype=x.dtype) img_ids[..., 1] = img_ids[..., 1] + torch.linspace(0, h_len - 1, steps=h_len, device=x.device, dtype=x.dtype)[:, None] img_ids[..., 2] = img_ids[..., 2] + torch.linspace(0, w_len - 1, steps=w_len, device=x.device, dtype=x.dtype)[None, :] img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs) txt_ids = torch.zeros((bs, context.shape[1], 3), device=x.device, dtype=x.dtype) return self.forward_orig(img, img_ids, hint, context, txt_ids, timesteps, y, guidance, control_type=kwargs.get("control_type", []))BTW,this could work ,too.
Your suggestion worked like a charm! Thanks!
I found an alternate solution, simply pad the tensor if its it's not divisible by 2 without a remainder.
Update ComfyUI\comfy\ldm\flux\controlnet.py
Replace the "forward" function with this. I didn't test this on all resolution but I believe it should work. It allowed me to do 1920x1080 resolution.
def forward(self, x, timesteps, context, y, guidance=None, hint=None, **kwargs): patch_size = 2 def pad_if_needed(tensor): _, _, h, w = tensor.shape pad_h = (patch_size - (h % patch_size)) % patch_size pad_w = (patch_size - (w % patch_size)) % patch_size if pad_h > 0 or pad_w > 0: return torch.nn.functional.pad(tensor, (0, pad_w, 0, pad_h)) return tensor if self.latent_input: hint = comfy.ldm.common_dit.pad_to_patch_size(hint, (patch_size, patch_size)) elif self.mistoline: hint = hint * 2.0 - 1.0 hint = self.input_cond_block(hint) else: hint = hint * 2.0 - 1.0 hint = self.input_hint_block(hint) hint = pad_if_needed(hint) # Add padding if needed hint = rearrange(hint, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size) bs, c, h, w = x.shape x = pad_if_needed(x) # Add padding if needed img = rearrange(x, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size) h_len = ((h + (patch_size - 1)) // patch_size) w_len = ((w + (patch_size - 1)) // patch_size) img_ids = torch.zeros((h_len, w_len, 3), device=x.device, dtype=x.dtype) img_ids[..., 1] = img_ids[..., 1] + torch.linspace(0, h_len - 1, steps=h_len, device=x.device, dtype=x.dtype)[:, None] img_ids[..., 2] = img_ids[..., 2] + torch.linspace(0, w_len - 1, steps=w_len, device=x.device, dtype=x.dtype)[None, :] img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs) txt_ids = torch.zeros((bs, context.shape[1], 3), device=x.device, dtype=x.dtype) return self.forward_orig(img, img_ids, hint, context, txt_ids, timesteps, y, guidance, control_type=kwargs.get("control_type", []))
This works well. Thank you very much.