img2img alternative script support for SDXL
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
What happened?
When trying to use the img2img alternative test script with the SDXL base model this error outputs:
img2imgalt.py", line 85, in find_noise_for_image_sigma_adjustment cond_in = torch.cat([uncond, cond]) ^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: expected Tensor as element 0 in argument 0, but got dict
Steps to reproduce the problem
- Load SDXL 1.0 base model
- Go to img2img tab and insert any image
- Load img2img alternative test script
- Try to generate image
What should have happened?
Script should create a noise pattern based on image.
Version or Commit where the problem happens
1.5.1
What Python version are you running on ?
Python 3.11.x (above, no supported yet)
What platforms do you use to access the UI ?
Windows
What device are you running WebUI on?
Nvidia GPUs (RTX 20 above)
Cross attention optimization
Automatic
What browsers do you use to access the UI ?
Mozilla Firefox
Command Line Arguments
None
List of extensions
None
Console logs
---
0%| | 0/50 [00:00<?, ?it/s]
*** Error completing request
*** Arguments: ('task(8znmuv88fptwc5g)', 0, '1', '', [], <PIL.Image.Image image mode=RGBA size=512x512 at 0x1599DEE5890>, None, None, None, None, None, None, 20, 0, 4, 0, 1, False, False, 1, 1, 7, 1.5, 0.75, -1.0, -1.0, 0, 0, 0, False, 0, 512, 512, 1, 0, 0, 32, 0, '', '', '', [], False, [], '', <gradio.routes.Request object at 0x000001599DEE5FD0>, 1, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x000001581B7AD4D0>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x00000157DCEB88D0>, <scripts.controlnet_ui.controlnet_ui_group.UiControlNetUnit object at 0x000001599E1B9D90>, '<ul>\n<li><code>CFG Scale</code> should be 2 or lower.</li>\n</ul>\n', True, True, '', '', True, 50, True, 1, 0, True, '<ul>\n<li><code>CFG Scale</code> should be 2 or lower.</li>\n</ul>\n', True, True, '', '', True, 50, True, 1, 0, False, '<ul>\n<li><code>CFG Scale</code> should be 2 or lower.</li>\n</ul>\n', True, True, '', '', True, 50, True, 1, 0, False, '<ul>\n<li><code>CFG Scale</code> should be 2 or lower.</li>\n</ul>\n', True, True, '', '', True, 50, True, 1, 0, True, '<ul>\n<li><code>CFG Scale</code> should be 2 or lower.</li>\n</ul>\n', True, True, '', '', True, 50, True, 1, 0, False, 4, 0.5, 'Linear', 'None', '<p style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, False, 'positive', 'comma', 0, False, False, '', '<p style="margin-bottom:0.75em">Will upscale the image by the selected scale factor; use width and height sliders to set tile size</p>', 64, 0, 2, 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, None, None, False, None, None, False, None, None, False, 50) {}
Traceback (most recent call last):
File "C:\X Drive\MachineLearning\Stable Diffusion\I dont even know anymore\LatestBuildForTesting\stable-diffusion-webui\modules\call_queue.py", line 58, in f
res = list(func(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^
File "C:\X Drive\MachineLearning\Stable Diffusion\I dont even know anymore\LatestBuildForTesting\stable-diffusion-webui\modules\call_queue.py", line 37, in f
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\X Drive\MachineLearning\Stable Diffusion\I dont even know anymore\LatestBuildForTesting\stable-diffusion-webui\modules\img2img.py", line 230, in img2img
processed = modules.scripts.scripts_img2img.run(p, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\X Drive\MachineLearning\Stable Diffusion\I dont even know anymore\LatestBuildForTesting\stable-diffusion-webui\modules\scripts.py", line 501, in run
processed = script.run(p, *script_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\X Drive\MachineLearning\Stable Diffusion\I dont even know anymore\LatestBuildForTesting\stable-diffusion-webui\scripts\img2imgalt.py", line 216, in run
processed = processing.process_images(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\X Drive\MachineLearning\Stable Diffusion\I dont even know anymore\LatestBuildForTesting\stable-diffusion-webui\modules\processing.py", line 677, in process_images
res = process_images_inner(p)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\X Drive\MachineLearning\Stable Diffusion\I dont even know anymore\LatestBuildForTesting\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 42, in processing_process_images_hijack
return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\X Drive\MachineLearning\Stable Diffusion\I dont even know anymore\LatestBuildForTesting\stable-diffusion-webui\modules\processing.py", line 794, in process_images_inner
samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\X Drive\MachineLearning\Stable Diffusion\I dont even know anymore\LatestBuildForTesting\stable-diffusion-webui\scripts\img2imgalt.py", line 188, in sample_extra
rec_noise = find_noise_for_image_sigma_adjustment(p, cond, uncond, cfg, st)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\X Drive\MachineLearning\Stable Diffusion\I dont even know anymore\LatestBuildForTesting\stable-diffusion-webui\scripts\img2imgalt.py", line 85, in find_noise_for_image_sigma_adjustment
cond_in = torch.cat([uncond, cond])
^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected Tensor as element 0 in argument 0, but got dict
---
Additional information
I tried making these additions to the script:
cond_tensor = cond['crossattn'] uncond_tensor = uncond['crossattn'] cond_in = torch.cat([uncond_tensor, cond_tensor], dim=1) cond_in = {"c_concat": cond_in} cond['crossattn'] = cond_in uncond['crossattn'] = cond_in
Which solved some errors but ultimately led to this error:
File "C:\X Drive\MachineLearning\Stable Diffusion\I dont even know anymore\LatestBuildForTesting\stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\wrappers.py", line 28, in forward return self.diffusion_model( ^^^^^^^^^^^^^^^^^^^^^ File "C:\X Drive\MachineLearning\Stable Diffusion\I dont even know anymore\LatestBuildForTesting\stable-diffusion-webui\venv\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\X Drive\MachineLearning\Stable Diffusion\I dont even know anymore\LatestBuildForTesting\stable-diffusion-webui\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py", line 979, in forward assert (y is not None) == ( ^^^^^^^^^^^^^^^^^^^^ AssertionError: must specify y if and only if the model is class-conditional
Would love for this to be fixed. Very helpful for consistent animation across frames.
Any ideas on how to fix this?
Notify me when this is fixed I also want to animate things please!
No news on whether this has been fixed?
Still getting this.
Still getting this.
It’ll never be fixed.
this tool has some of the worst code organization I’ve ever seen in my life
I was having the same issue and searched all over the internet, eventually, figured out that only specific checkpoints can handle image-to-image alternatives, for example - arcane-diffusion-v3 worked for me.
It looks like shared.sd_model.apply_model isn't applicable for stable diffusion xl models. So there must be some other method to get the eps value.
the original method being called for sd1.5 is in modules.sd_hijack_utils and is ldm.models.diffusion.ddpm.LatentDiffusion
https://github.com/CompVis/latent-diffusion/blob/main/ldm/models/diffusion/ddpm.py
which clearly does a lot more and leads to an error in forward call RuntimeError: mat1 and mat2 shapes cannot be multiplied (7950x640 and 2048x640) if the assertions are disabled So clearly the forward call is not correct.
Similar errors also happen, but maybe not relevant
https://github.com/Mikubill/sd-webui-controlnet/issues/634
https://github.com/Mikubill/sd-webui-controlnet/issues/5
Bigger issue might be this -
class OpenAIWrapper(IdentityWrapper): def forward( self, x: torch.Tensor, t: torch.Tensor, c: dict, **kwargs ) -> torch.Tensor: x = torch.cat((x, c.get("concat", torch.Tensor([]).type_as(x))), dim=1) return self.diffusion_model( x, timesteps=t, context=c.get("crossattn", None), y=c.get("vector", None), **kwargs, )
So then, maybe we can do something like
print((x_in * c_in).shape)
print((cond_in["c_concat"][0]).shape)
print((cond_in["c_crossattn"][0]).shape)
#eps = shared.sd_model.model.diffusion_model(torch.cat([x_in * c_in, p.image_conditioning]), timesteps=t, context=cond_in, y=None)
eps = shared.sd_model.model(x_in * c_in, t, {"concat": cond_in["c_concat"][0], "crossattn": cond_in["c_crossattn"][0]} )
but the shapes are
torch.Size([2, 4, 150, 106]) torch.Size([2, 5, 1, 1]) torch.Size([1, 154, 2048])
and
eps = shared.sd_model.model(x_in * c_in, t, {"vector": cond_in["c_concat"][0], "crossattn": cond_in["c_crossattn"][0]} )
gives
s1 = einsum('b i d, b j d -> b i j', q[:, i:end], k) File "D:\stablediffusion_5\stable-diffusion-webui\venv\lib\site-packages\torch\functional.py", line 377, in einsum return _VF.einsum(equation, operands) # type: ignore[attr-defined] RuntimeError: einsum(): subscript b has size 10 for operand 1 which does not broadcast with previously seen size 20
Honestly I feel like this is really, really close to working, I just have no Idea how to fix it.
the forward definition is Apply the model to an input batch. :param x: an [N x C x ...] Tensor of inputs. :param timesteps: a 1-D batch of timesteps. :param context: conditioning plugged in via crossattn :param y: an [N] Tensor of labels, if class-conditional. :return: an [N x C x ...] Tensor of outputs.
Made a fix - https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/16761