Screen freezes when running ksampler
Expected Behavior
run comfy
Actual Behavior
Screen freezes while mouse still available.
Gpu fans didn't even spin at first but now is hot and blowing I think it is running something, but the entire gui is not functioning.
2nd time this happens on a second install for a machine that fails on every level.
Steps to Reproduce
/
Debug Logs
/
Other
I will let it run for now see if it finishes.
Usually everything crashes now, I switched to another 7900xtx yet same issue once I hit ksampler it is only a matter of time before it crashes my whole system.
The whole hardware is the same except the MOBO on this device the workflow I run is identical to one I run on another device with the same hardware specs.
I started lowering the res for some reason 1200 x 800 can cause issues 500x500 runs fine, but even that when VAE encoding makes my 24gb run out on a model I have never had issues with.
This thing is insane, I have a 500x500 px latent image running on ksampler speeding over 24GB on VRAM from 0-100 in less then a second.
I have this issue, when ksampler starts running, computer slows down to the point it's unresponsive. If i exit or kill the server, computer returns to normal. I have a 3090, 64gb RAM, a beefy rizen. I use comfyui portable
Ok in my case i fixed it. Check wich cuda version your comfyui needs based on pytorch being used. Use command:
python -c "import torch; print(torch.version.cuda)"
It will return the required cuda version, then install it.
If you are using portable comfy like me, you have tu run command with python inside portable version.
Update2:
After a series of tests, it was found that the driver version was incorrect, and the required driver version was: https://www.amd.com/en/resources/support-articles/release-notes/RN-AMDGPU-WINDOWS-PYTORCH-7-1-1.html
My situation is somewhat similar to yours, with a 9070 (16GB vRAM + 64GB RAM).
And I'm using ComfyUI_windows_portable_amd.7z (0.4.0) with AMD Adrenalin 25.12.1.
However, my mouse initially worked, but after a while, it stopped moving, and the entire system completely froze.
But during multiple attempts, the graphics driver crashed once, and then ComfyUI output the following log:
E:\AI\ComfyUI>SET http_proxy=http://127.0.0.1:10800
E:\AI\ComfyUI>SET https_proxy=http://127.0.0.1:10800
E:\AI\ComfyUI>.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build
Adding extra search path checkpoints E:\AI\StableDiffusion\userspace\models\Stable-diffusion
[WARNING] failed to run amdgpu-arch: binary not found.
Checkpoint files will always be loaded safely.
Total VRAM 16304 MB, total RAM 65462 MB
pytorch version: 2.9.0+rocmsdk20251116
Set: torch.backends.cudnn.enabled = False for better AMD performance.
AMD arch: gfx1201
ROCm version: (7, 1)
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 9070 : native
Enabled pinned memory 29457.0
Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention
Python version: 3.12.10 (tags/v3.12.10:0cc8128, Apr 8 2025, 12:21:36) [MSC v.1943 64 bit (AMD64)]
ComfyUI version: 0.4.0
ComfyUI frontend version: 1.33.13
[Prompt Server] web root: E:\AI\ComfyUI\python_embeded\Lib\site-packages\comfyui_frontend_package\static
Total VRAM 16304 MB, total RAM 65462 MB
pytorch version: 2.9.0+rocmsdk20251116
Set: torch.backends.cudnn.enabled = False for better AMD performance.
AMD arch: gfx1201
ROCm version: (7, 1)
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 9070 : native
Enabled pinned memory 29457.0
Import times for custom nodes:
0.0 seconds: E:\AI\ComfyUI\ComfyUI\custom_nodes\websocket_image_save.py
Context impl SQLiteImpl.
Will assume non-transactional DDL.
No target revision found.
Starting server
To see the GUI go to: http://127.0.0.1:8188
got prompt
model weight dtype torch.float16, manual cast: None
model_type EPS
Using split attention in VAE
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load SD1ClipModel
loaded completely; 14633.80 MB usable, 235.84 MB loaded, full load: True
Requested to load BaseModel
loaded completely; 14266.08 MB usable, 1639.41 MB loaded, full load: True
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:22<00:00, 1.14s/it]
Requested to load AutoencoderKL
loaded completely; 10368.12 MB usable, 159.56 MB loaded, full load: True
E:\AI\ComfyUI\python_embeded\Lib\site-packages\torch\nn\modules\conv.py:543: UserWarning: bgemm_internal_cublaslt error: HIPBLAS_STATUS_INTERNAL_ERROR when calling hipblasLtMatmul with transpose_mat1 0 transpose_mat2 0 m 4096 n 4 k 4 lda 4096 ldb 4 ldc 4096 abType 14 cType 14 computeType 2 scaleType 0. Will attempt to recover by calling cublas instead. (Triggered internally at C:/b/pytorch/aten/src/ATen/hip/HIPBlas.cpp:560.)
return F.conv2d(
!!! Exception during processing !!! CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
Traceback (most recent call last):
File "E:\AI\ComfyUI\ComfyUI\execution.py", line 515, in execute
output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AI\ComfyUI\ComfyUI\execution.py", line 329, in get_output_data
return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AI\ComfyUI\ComfyUI\execution.py", line 303, in _async_map_node_over_list
await process_inputs(input_dict, i)
File "E:\AI\ComfyUI\ComfyUI\execution.py", line 291, in process_inputs
result = f(**inputs)
^^^^^^^^^^^
File "E:\AI\ComfyUI\ComfyUI\nodes.py", line 298, in decode
images = vae.decode(samples["samples"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AI\ComfyUI\ComfyUI\comfy\sd.py", line 783, in decode
out = self.process_output(self.first_stage_model.decode(samples, **vae_options).to(self.output_device).float())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AI\ComfyUI\ComfyUI\comfy\ldm\models\autoencoder.py", line 252, in decode
dec = self.post_quant_conv(z)
^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AI\ComfyUI\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AI\ComfyUI\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AI\ComfyUI\ComfyUI\comfy\ops.py", line 200, in forward
return super().forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AI\ComfyUI\python_embeded\Lib\site-packages\torch\nn\modules\conv.py", line 548, in forward
return self._conv_forward(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\AI\ComfyUI\python_embeded\Lib\site-packages\torch\nn\modules\conv.py", line 543, in _conv_forward
return F.conv2d(
^^^^^^^^^
RuntimeError: CUDA error: HIPBLAS_STATUS_ALLOC_FAILED when calling `hipblasCreate(handle)`
Prompt executed in 30.49 seconds
This is the flowchart:
Update:
After some attempts, I found that I couldn't generate images with a width and height greater than or equal to 512. For example, a 384x384 image can be generated without freezing: