KSampler at::cuda::blas::getrsBatched: not supported for HIP on Windows
Custom Node Testing
- [x] I have tried disabling custom nodes and the issue persists (see how to disable custom nodes if you need help)
Expected Behavior
When I try to run WAN2.1 using my RX9070, it should behave normally.
Actual Behavior
When I try to run WAN2.1 using my RX9070 with HIP SDK 6.4.2 on Windows, comfyui occured an error:
KSampler at::cuda::blas::getrsBatched: not supported for HIP on Windows
Steps to Reproduce
Download the latest AMD portable version, install HIP SDK 6.4.2, try to use WAN2.2_5B's default params to generate a video
Debug Logs
D:\Tools\ComfyUI>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --disable-smart-memory
Checkpoint files will always be loaded safely.
Total VRAM 16304 MB, total RAM 48232 MB
pytorch version: 2.8.0a0+gitfc14c65
AMD arch: gfx1201
ROCm version: (6, 4)
Set vram state to: NORMAL_VRAM
Disabling smart memory management
Device: cuda:0 AMD Radeon RX 9070 : native
Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention
Python version: 3.12.10 (tags/v3.12.10:0cc8128, Apr 8 2025, 12:21:36) [MSC v.1943 64 bit (AMD64)]
ComfyUI version: 0.3.64
ComfyUI frontend version: 1.27.10
[Prompt Server] web root: D:\Tools\ComfyUI\python_embeded\Lib\site-packages\comfyui_frontend_package\static
Import times for custom nodes:
0.0 seconds: D:\Tools\ComfyUI\ComfyUI\custom_nodes\websocket_image_save.py
Context impl SQLiteImpl.
Will assume non-transactional DDL.
No target revision found.
Starting server
To see the GUI go to: http://127.0.0.1:8188
got prompt
Using split attention in VAE
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
Using scaled fp8: fp8 matrix mult: False, scale input: False
Requested to load WanTEModel
loaded completely 9.5367431640625e+25 6419.477203369141 True
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cuda:0, dtype: torch.float16
model weight dtype torch.float16, manual cast: None
model_type FLOW
Requested to load WAN22
0 models unloaded.
loaded partially 6405.9248046875 6403.3831787109375 0
10%|████████▎ | 2/20 [02:05<18:45, 62.52s/it]
!!! Exception during processing !!! at::cuda::blas::getrsBatched: not supported for HIP on Windows
Traceback (most recent call last):
File "D:\Tools\ComfyUI\ComfyUI\execution.py", line 496, in execute
output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools\ComfyUI\ComfyUI\execution.py", line 315, in get_output_data
return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools\ComfyUI\ComfyUI\execution.py", line 289, in _async_map_node_over_list
await process_inputs(input_dict, i)
File "D:\Tools\ComfyUI\ComfyUI\execution.py", line 277, in process_inputs
result = f(**inputs)
^^^^^^^^^^^
File "D:\Tools\ComfyUI\ComfyUI\nodes.py", line 1525, in sample
return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools\ComfyUI\ComfyUI\nodes.py", line 1492, in common_ksampler
samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools\ComfyUI\ComfyUI\comfy\sample.py", line 45, in sample
samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools\ComfyUI\ComfyUI\comfy\samplers.py", line 1161, in sample
return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools\ComfyUI\ComfyUI\comfy\samplers.py", line 1051, in sample
return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools\ComfyUI\ComfyUI\comfy\samplers.py", line 1036, in sample
output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools\ComfyUI\ComfyUI\comfy\patcher_extension.py", line 112, in execute
return self.original(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools\ComfyUI\ComfyUI\comfy\samplers.py", line 1004, in outer_sample
output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools\ComfyUI\ComfyUI\comfy\samplers.py", line 987, in inner_sample
samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools\ComfyUI\ComfyUI\comfy\patcher_extension.py", line 112, in execute
return self.original(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools\ComfyUI\ComfyUI\comfy\samplers.py", line 759, in sample
samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools\ComfyUI\ComfyUI\comfy\extra_samplers\uni_pc.py", line 868, in sample_unipc
x = uni_pc.sample(noise, timesteps=timesteps, skip_type="time_uniform", method="multistep", order=order, lower_order_final=True, callback=callback, disable_pbar=disable)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools\ComfyUI\ComfyUI\comfy\extra_samplers\uni_pc.py", line 722, in sample
x, model_x = self.multistep_uni_pc_update(x, model_prev_list, t_prev_list, vec_t, init_order, use_corrector=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools\ComfyUI\ComfyUI\comfy\extra_samplers\uni_pc.py", line 472, in multistep_uni_pc_update
return self.multistep_uni_pc_bh_update(x, model_prev_list, t_prev_list, t, order, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Tools\ComfyUI\ComfyUI\comfy\extra_samplers\uni_pc.py", line 653, in multistep_uni_pc_bh_update
rhos_c = torch.linalg.solve(R, b)
^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: at::cuda::blas::getrsBatched: not supported for HIP on Windows
Prompt executed in 143.15 seconds
Other
Related issue: https://github.com/ROCm/TheRock/issues/1367 After reading this whole issue description, I still cannot manage to solve this problem by myself.
I have exactly the same error on a rx 7900 GRE but i have dont deactivate smart memory management
I ran into the same issue with my 7900 XTX and, for me, the problem was the default sampler "uni_pc". I tried to switch to euler and it worked.
Same issue 7900XTX switched to euler, worked
I switch to euler and it worked. By 9070 GRE.