stable-diffusion-webui [Bug]: RuntimeError with Refiner Enabled and Batch Count > 1 in img2img (note: refiner works when batch=1)

Checklist

[X] The issue exists after disabling all extensions
[x] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[X] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

When enabling the "Refiner" in the img2img tab with a "Batch Count" greater than 1, the application crashes, throwing a RuntimeError indicating a device mismatch between CPU and CUDA. Did a complete git clone and new setup to double-check the issue occurred on a fresh installation.

Steps to reproduce the problem

Launch SD
Navigate to img2img tab.
Enable the "Refiner".
Set "Batch Count" to a value greater than 1.
Attempt to process >1 images

What should have happened?

The application should process multiple images in a batch with the refiner enabled without encountering a device mismatch error.

What browsers do you use to access the UI ?

Mozilla Firefox, Google Chrome

Sysinfo

sysinfo-2024-01-12-03-30.json

Console logs

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

___

Full console log / stack-trace
Using already loaded custom-sdxl.safetensors [4d161dc67e]: done in 4.1s (send model to cpu: 2.1s, send model to device: 2.1s)2.22s/it]
2024-01-11 18:57:10,637 - ControlNet - INFO - unit_separate = False, style_align = False
2024-01-11 18:57:10,801 - ControlNet - INFO - Loading model from cache: diffusers_xl_depth_full [2f51180b]
2024-01-11 18:57:10,804 - ControlNet - INFO - Loading preprocessor: depth
2024-01-11 18:57:10,805 - ControlNet - INFO - preprocessor resolution = 1604
2024-01-11 18:57:10,865 - ControlNet - INFO - ControlNet Hooked - Time = 0.23399829864501953
  0%|                                                                                                                                 | 0/49 [00:00<?, ?it/s]
Restoring base VAE
Applying attention optimization: xformers... done.
VAE weights loaded.
*** Error completing request
*** Arguments: ('task(2i8qc2q0e4mpuk0)', 0, 'positive prompt', 'negative prompts', [], <PIL.Image.Image image mode=RGBA size=1600x2000 at 0x27D89FC8A90>, None, None, None, None, None, None, 75, 'DPM++ 3M SDE Karras', 4, 0, 1, 2, 1, 9, 1.5, 0.65, 0, 2000, 1600, 1, 0, 0, 32, 0, '', '', '', ['VAE: sdxl-model.vae.safetensors'], False, [], '', <gradio.routes.Request object at 0x0000027D89B88670>, 0, True, 'sd_xl_refiner_1.0.safetensors [7440042bbd]', 0.9, -1, False, -1, 0, 0, 0, <scripts.animatediff_ui.AnimateDiffProcess object at 0x0000027D89B8BFD0>, UiControlNetUnit(enabled=True, module='depth_midas', model='diffusers_xl_depth_full [2f51180b]', weight=0.55, image={'image': array([[[148, 117,  90],
***         [156, 123,  93],
***         [160, 124,  93],
***         ...,
***         [222, 178, 142],
***         [222, 178, 142],
***         [223, 183, 151]],
***
***        [[152, 120,  89],
***         [150, 116,  88],
***         [151, 115,  88],
***         ...,
***         [218, 170, 134],
***         [217, 169, 132],
***         [217, 172, 139]],
***
***        [[155, 123,  94],
***         [154, 120,  92],
***         [155, 121,  92],
***         ...,
***         [210, 158, 123],
***         [211, 158, 122],
***         [209, 159, 125]],
***
***        ...,
***
***        [[199, 201, 202],
***         [198, 202, 202],
***         [199, 204, 204],
***         ...,
***         [177, 182, 186],
***         [178, 182, 187],
***         [176, 180, 185]],
***
***        [[200, 203, 204],
***         [197, 201, 200],
***         [196, 200, 200],
***         ...,
***         [172, 176, 181],
***         [170, 173, 179],
***         [168, 170, 177]],
***
***        [[193, 195, 194],
***         [194, 195, 197],
***         [195, 197, 198],
***         ...,
***         [159, 163, 167],
***         [160, 163, 167],
***         [159, 161, 166]]], dtype=uint8), 'mask': array([[[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]],
***
***        [[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]],
***
***        [[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]],
***
***        ...,
***
***        [[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]],
***
***        [[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]],
***
***        [[0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0],
***         ...,
***         [0, 0, 0],
***         [0, 0, 0],
***         [0, 0, 0]]], dtype=uint8)}, resize_mode='Crop and Resize', low_vram=False, processor_res=512, threshold_a=-1, threshold_b=-1, guidance_start=0.08, guidance_end=0.6, pixel_perfect=True, control_mode='Balanced', inpaint_crop_input_image=True, hr_option='Both', save_detected_map=True, advanced_weighting=None), UiControlNetUnit(enabled=False, module='none', model='None', weight=1, image=None, resize_mode='Crop and Resize', low_vram=False, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', inpaint_crop_input_image=False, hr_option='Both', save_detected_map=True, advanced_weighting=None), UiControlNetUnit(enabled=False, module='none', model='None', weight=1, image=None, resize_mode='Crop and Resize', low_vram=False, processor_res=-1, threshold_a=-1, threshold_b=-1, guidance_start=0, guidance_end=1, pixel_perfect=False, control_mode='Balanced', inpaint_crop_input_image=False, hr_option='Both', save_detected_map=True, advanced_weighting=None), '* `CFG Scale` should be 2 or lower.', True, True, '', '', True, 50, True, 1, 0, False, 4, 0.5, 'Linear', 'None', '<p style="margin-bottom:0.75em">Recommended settings: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8</p>', 128, 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up', 'down'], False, False, 'positive', 'comma', 0, False, False, 'start', '', '<p style="margin-bottom:0.75em">Will upscale the image by the selected scale factor; use width and height sliders to set tile size</p>', 64, 0, 2, 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0, False, None, None, False, None, None, False, None, None, False, 50, '<p style="margin-bottom:0.75em">Will upscale the image depending on the selected target size type</p>', 512, 0, 8, 32, 64, 0.35, 32, 0, True, 0, False, 8, 0, 0, 2048, 2048, 2) {}
    Traceback (most recent call last):
      File "C:\automatic1111-sd-webui\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "C:\automatic1111-sd-webui\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "C:\automatic1111-sd-webui\modules\img2img.py", line 238, in img2img
        processed = process_images(p)
      File "C:\automatic1111-sd-webui\modules\processing.py", line 734, in process_images
        res = process_images_inner(p)
      File "C:\automatic1111-sd-webui\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 42, in processing_process_images_hijack
        return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
      File "C:\automatic1111-sd-webui\modules\processing.py", line 868, in process_images_inner
        samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
      File "C:\automatic1111-sd-webui\extensions\sd-webui-controlnet\scripts\hook.py", line 435, in process_sample
        return process.sample_before_CN_hack(*args, **kwargs)
      File "C:\automatic1111-sd-webui\modules\processing.py", line 1527, in sample
        samples = self.sampler.sample_img2img(self, self.init_latent, x, conditioning, unconditional_conditioning, image_conditioning=self.image_conditioning)
      File "C:\automatic1111-sd-webui\modules\sd_samplers_kdiffusion.py", line 188, in sample_img2img
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "C:\automatic1111-sd-webui\modules\sd_samplers_common.py", line 261, in launch_sampling
        return func()
      File "C:\automatic1111-sd-webui\modules\sd_samplers_kdiffusion.py", line 188, in <lambda>
        samples = self.launch_sampling(t_enc + 1, lambda: self.func(self.model_wrap_cfg, xi, extra_args=self.sampler_extra_args, disable=False, callback=self.callback_state, **extra_params_kwargs))
      File "C:\automatic1111-sd-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "C:\automatic1111-sd-webui\repositories\k-diffusion\k_diffusion\sampling.py", line 668, in sample_dpmpp_3m_sde
        denoised = model(x, sigmas[i] * s_in, **extra_args)
      File "C:\automatic1111-sd-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\automatic1111-sd-webui\modules\sd_samplers_cfg_denoiser.py", line 188, in forward
        x_out[a:b] = self.inner_model(x_in[a:b], sigma_in[a:b], cond=make_condition_dict(c_crossattn, image_cond_in[a:b]))
      File "C:\automatic1111-sd-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\automatic1111-sd-webui\repositories\k-diffusion\k_diffusion\external.py", line 112, in forward
        eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
      File "C:\automatic1111-sd-webui\repositories\k-diffusion\k_diffusion\external.py", line 138, in get_eps
        return self.inner_model.apply_model(*args, **kwargs)
      File "C:\automatic1111-sd-webui\modules\sd_models_xl.py", line 37, in apply_model
        return self.model(x, t, cond)
      File "C:\automatic1111-sd-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\automatic1111-sd-webui\modules\sd_hijack_utils.py", line 17, in <lambda>
        setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: self(*args, **kwargs))
      File "C:\automatic1111-sd-webui\modules\sd_hijack_utils.py", line 28, in __call__
        return self.__orig_func(*args, **kwargs)
      File "C:\automatic1111-sd-webui\repositories\generative-models\sgm\modules\diffusionmodules\wrappers.py", line 28, in forward
        return self.diffusion_model(
      File "C:\automatic1111-sd-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\automatic1111-sd-webui\modules\sd_unet.py", line 91, in UNetModel_forward
        return original_forward(self, x, timesteps, context, *args, **kwargs)
      File "C:\automatic1111-sd-webui\repositories\generative-models\sgm\modules\diffusionmodules\openaimodel.py", line 984, in forward
        emb = self.time_embed(t_emb)
      File "C:\automatic1111-sd-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\automatic1111-sd-webui\venv\lib\site-packages\torch\nn\modules\container.py", line 217, in forward
        input = module(input)
      File "C:\automatic1111-sd-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
        return forward_call(*args, **kwargs)
      File "C:\automatic1111-sd-webui\extensions-builtin\Lora\networks.py", line 486, in network_Linear_forward
        return originals.Linear_forward(self, input)
      File "C:\automatic1111-sd-webui\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
        return F.linear(input, self.weight, self.bias)
    RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

---

Additional information

GPU: NVIDIA GeForce RTX 3090 with 24 GB RAM.
Checkpoint model is 6.46 GBs, VAE + stock Refiner easily fits in 24GBs
Issue seems related to the model management in batch processing, potentially within the reuse_model_from_already_loaded function or related model handling logic.
GPU drivers are up to date.

Jan 12 '24 03:01 djdarcy

Any chance we can fix this? It's a particularly annoying bug that only happens in img2img and prevents using simple 4x4 batch runs, etc. it has to be worked around with "batch size" but that of course has a reasonable limit one can use.

Mar 14 '24 20:03 clayne

I am having the same problem. After hours of trial and error, I narrowed it down to the use of the refiner. I'm having the same errors as OP. I have a 24GB VRAM card, if that's any help.

Mar 16 '24 21:03 FugueSegue

stable-diffusion-webui stable-diffusion-webui copied to clipboard

[Bug]: RuntimeError with Refiner Enabled and Batch Count > 1 in img2img (note: refiner works when batch=1)

Checklist

What happened?

Steps to reproduce the problem

What should have happened?

What browsers do you use to access the UI ?

Sysinfo

Console logs

Additional information

stable-diffusion-webui
stable-diffusion-webui copied to clipboard