ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

KSampler at::cuda::blas::getrsBatched: not supported for HIP on Windows

Open QyInvoLing opened this issue 2 months ago • 4 comments

Custom Node Testing

Expected Behavior

When I try to run WAN2.1 using my RX9070, it should behave normally.

Actual Behavior

When I try to run WAN2.1 using my RX9070 with HIP SDK 6.4.2 on Windows, comfyui occured an error:

KSampler at::cuda::blas::getrsBatched: not supported for HIP on Windows

Steps to Reproduce

Download the latest AMD portable version, install HIP SDK 6.4.2, try to use WAN2.2_5B's default params to generate a video

Debug Logs

D:\Tools\ComfyUI>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --disable-smart-memory
Checkpoint files will always be loaded safely.
Total VRAM 16304 MB, total RAM 48232 MB
pytorch version: 2.8.0a0+gitfc14c65
AMD arch: gfx1201
ROCm version: (6, 4)
Set vram state to: NORMAL_VRAM
Disabling smart memory management
Device: cuda:0 AMD Radeon RX 9070 : native
Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention
Python version: 3.12.10 (tags/v3.12.10:0cc8128, Apr  8 2025, 12:21:36) [MSC v.1943 64 bit (AMD64)]
ComfyUI version: 0.3.64
ComfyUI frontend version: 1.27.10
[Prompt Server] web root: D:\Tools\ComfyUI\python_embeded\Lib\site-packages\comfyui_frontend_package\static

Import times for custom nodes:
   0.0 seconds: D:\Tools\ComfyUI\ComfyUI\custom_nodes\websocket_image_save.py

Context impl SQLiteImpl.
Will assume non-transactional DDL.
No target revision found.
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
Using split attention in VAE
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
Using scaled fp8: fp8 matrix mult: False, scale input: False
Requested to load WanTEModel
loaded completely 9.5367431640625e+25 6419.477203369141 True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
model weight dtype torch.float16, manual cast: None
model_type FLOW
Requested to load WAN22
0 models unloaded.
loaded partially 6405.9248046875 6403.3831787109375 0
 10%|████████▎                                                                          | 2/20 [02:05<18:45, 62.52s/it]
!!! Exception during processing !!! at::cuda::blas::getrsBatched: not supported for HIP on Windows
Traceback (most recent call last):
  File "D:\Tools\ComfyUI\ComfyUI\execution.py", line 496, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools\ComfyUI\ComfyUI\execution.py", line 315, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools\ComfyUI\ComfyUI\execution.py", line 289, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "D:\Tools\ComfyUI\ComfyUI\execution.py", line 277, in process_inputs
    result = f(**inputs)
             ^^^^^^^^^^^
  File "D:\Tools\ComfyUI\ComfyUI\nodes.py", line 1525, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools\ComfyUI\ComfyUI\nodes.py", line 1492, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools\ComfyUI\ComfyUI\comfy\sample.py", line 45, in sample
    samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools\ComfyUI\ComfyUI\comfy\samplers.py", line 1161, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools\ComfyUI\ComfyUI\comfy\samplers.py", line 1051, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools\ComfyUI\ComfyUI\comfy\samplers.py", line 1036, in sample
    output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools\ComfyUI\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools\ComfyUI\ComfyUI\comfy\samplers.py", line 1004, in outer_sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools\ComfyUI\ComfyUI\comfy\samplers.py", line 987, in inner_sample
    samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools\ComfyUI\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools\ComfyUI\ComfyUI\comfy\samplers.py", line 759, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools\ComfyUI\ComfyUI\comfy\extra_samplers\uni_pc.py", line 868, in sample_unipc
    x = uni_pc.sample(noise, timesteps=timesteps, skip_type="time_uniform", method="multistep", order=order, lower_order_final=True, callback=callback, disable_pbar=disable)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools\ComfyUI\ComfyUI\comfy\extra_samplers\uni_pc.py", line 722, in sample
    x, model_x = self.multistep_uni_pc_update(x, model_prev_list, t_prev_list, vec_t, init_order, use_corrector=True)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools\ComfyUI\ComfyUI\comfy\extra_samplers\uni_pc.py", line 472, in multistep_uni_pc_update
    return self.multistep_uni_pc_bh_update(x, model_prev_list, t_prev_list, t, order, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Tools\ComfyUI\ComfyUI\comfy\extra_samplers\uni_pc.py", line 653, in multistep_uni_pc_bh_update
    rhos_c = torch.linalg.solve(R, b)
             ^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: at::cuda::blas::getrsBatched: not supported for HIP on Windows

Prompt executed in 143.15 seconds

Other

Related issue: https://github.com/ROCm/TheRock/issues/1367 After reading this whole issue description, I still cannot manage to solve this problem by myself.

QyInvoLing avatar Oct 09 '25 16:10 QyInvoLing

I have exactly the same error on a rx 7900 GRE but i have dont deactivate smart memory management

TooEvil0222 avatar Oct 18 '25 20:10 TooEvil0222

I ran into the same issue with my 7900 XTX and, for me, the problem was the default sampler "uni_pc". I tried to switch to euler and it worked.

Cayaf89 avatar Oct 20 '25 19:10 Cayaf89

Same issue 7900XTX switched to euler, worked

siddagivers avatar Oct 31 '25 14:10 siddagivers

I switch to euler and it worked. By 9070 GRE.

Yamserenity avatar Dec 04 '25 05:12 Yamserenity