ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Hip memory issue in 9070xt with rocm-7.0.2

Open druidican opened this issue 2 months ago • 30 comments

Custom Node Testing

Expected Behavior

a simpe genetration of an image,

Actual Behavior

during image genetration the following error message appears

Steps to Reproduce

install rocm-7.0.2, use a default workflow with ksampler -> ultimate SD upscale -> face detailer

after one image, the following will fail non stop,

Debug Logs

# ComfyUI Error Report
## Error Details
- **Node ID:** 116
- **Node Type:** FaceDetailer
- **Exception Type:** torch.AcceleratorError
- **Exception Message:** HIP error: an illegal memory access was encountered
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.


## Stack Trace

  File "/home/lasse/ComfyUI/execution.py", line 496, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/execution.py", line 315, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/custom_nodes/comfyui-lora-manager/py/metadata_collector/metadata_hook.py", line 165, in async_map_node_over_list_with_metadata
    results = await original_map_node_over_list(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/execution.py", line 289, in _async_map_node_over_list
    await process_inputs(input_dict, i)

  File "/home/lasse/ComfyUI/execution.py", line 277, in process_inputs
    result = f(**inputs)
             ^^^^^^^^^^^

  File "/home/lasse/ComfyUI/custom_nodes/comfyui-impact-pack/modules/impact/impact_pack.py", line 876, in doit
    enhanced_img, cropped_enhanced, cropped_enhanced_alpha, mask, cnet_pil_list = FaceDetailer.enhance_face(
                                                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/custom_nodes/comfyui-impact-pack/modules/impact/impact_pack.py", line 830, in enhance_face
    DetailerForEach.do_detail(image, segs, model, clip, vae, guide_size, guide_size_for_bbox, max_size, seed, steps, cfg,

  File "/home/lasse/ComfyUI/custom_nodes/comfyui-impact-pack/modules/impact/impact_pack.py", line 362, in do_detail
    enhanced_image, cnet_pils = core.enhance_detail(cropped_image, model, clip, vae, guide_size, guide_size_for_bbox, max_size,
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/custom_nodes/comfyui-impact-pack/modules/impact/core.py", line 383, in enhance_detail
    refined_latent = impact_sampling.ksampler_wrapper(model2, seed2, steps2, cfg2, sampler_name2, scheduler2, positive2, negative2,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/custom_nodes/comfyui-impact-pack/modules/impact/impact_sampling.py", line 209, in ksampler_wrapper
    refined_latent = separated_sample(model, True, seed, advanced_steps, cfg, sampler_name, scheduler,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/custom_nodes/comfyui-impact-pack/modules/impact/impact_sampling.py", line 182, in separated_sample
    res = sample_with_custom_noise(model, add_noise, seed, cfg, positive, negative, impact_sampler, sigmas, latent_image, noise=noise, callback=callback)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/custom_nodes/comfyui-impact-pack/modules/impact/impact_sampling.py", line 126, in sample_with_custom_noise
    samples = comfy.sample.sample_custom(model, noise, cfg, sampler, sigmas, positive, negative, latent_image,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/sample.py", line 50, in sample_custom
    samples = comfy.samplers.sample(model, noise, positive, negative, cfg, model.load_device, sampler, sigmas, model_options=model.model_options, latent_image=latent_image, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/samplers.py", line 1044, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/samplers.py", line 1029, in sample
    output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/samplers.py", line 997, in outer_sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/samplers.py", line 980, in inner_sample
    samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/samplers.py", line 752, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/k_diffusion/sampling.py", line 795, in sample_dpmpp_2m
    denoised = model(x, sigmas[i] * s_in, **extra_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/samplers.py", line 401, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/samplers.py", line 953, in __call__
    return self.outer_predict_noise(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/samplers.py", line 960, in outer_predict_noise
    ).execute(x, timestep, model_options, seed)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/samplers.py", line 963, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/samplers.py", line 381, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/samplers.py", line 206, in calc_cond_batch
    return _calc_cond_batch_outer(model, conds, x_in, timestep, model_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/samplers.py", line 214, in _calc_cond_batch_outer
    return executor.execute(model, conds, x_in, timestep, model_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/samplers.py", line 326, in _calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/model_base.py", line 161, in apply_model
    return comfy.patcher_extension.WrapperExecutor.new_class_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/model_base.py", line 200, in _apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py", line 831, in forward
    return comfy.patcher_extension.WrapperExecutor.new_class_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py", line 873, in _forward
    h = forward_timestep_embed(module, h, emb, context, transformer_options, time_context=time_context, num_video_frames=num_video_frames, image_only_indicator=image_only_indicator)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py", line 38, in forward_timestep_embed
    x = layer(x, emb)
        ^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py", line 239, in forward
    return checkpoint(
           ^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/ldm/modules/diffusionmodules/util.py", line 191, in checkpoint
    return func(*inputs)
           ^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py", line 252, in _forward
    h = self.in_layers(x)
        ^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/nn/modules/container.py", line 244, in forward
    input = module(input)
            ^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/ops.py", line 146, in forward
    return self.forward_comfy_cast_weights(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/comfy/ops.py", line 141, in forward_comfy_cast_weights
    return self._conv_forward(input, weight, bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/nn/modules/conv.py", line 543, in _conv_forward
    return F.conv2d(
           ^^^^^^^^^


## System Information
- **ComfyUI Version:** 0.3.65
- **Arguments:** main.py --listen 0.0.0.0 --output-directory /home/lasse/MEGA/ComfyUI --use-pytorch-cross-attention --reserve-vram 1 --lowvram --fast --disable-smart-memory
- **OS:** posix
- **Python Version:** 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
- **Embedded Python:** false
- **PyTorch Version:** 2.8.0+rocm7.0.2.git245bf6ed
## Devices

- **Name:** cuda:0 AMD Radeon Graphics : native
  - **Type:** cuda
  - **VRAM Total:** 17095983104
  - **VRAM Free:** 15807266816
  - **Torch VRAM Total:** 161480704
  - **Torch VRAM Free:** 17809408

## Logs

2025-10-16T15:00:55.196764 - 2025-10-16T15:00:55.196864 -   File "/home/lasse/ComfyUI/main.py", line 195, in prompt_worker
2025-10-16T15:00:55.197216 - 2025-10-16T15:00:55.197287 -     2025-10-16T15:00:55.197397 - e.execute(item[2], prompt_id, item[3], item[4])2025-10-16T15:00:55.197526 - 
2025-10-16T15:00:55.197670 - 2025-10-16T15:00:55.197738 -   File "/home/lasse/ComfyUI/execution.py", line 649, in execute
2025-10-16T15:00:55.198239 - 2025-10-16T15:00:55.198317 -     2025-10-16T15:00:55.198431 - asyncio.run(self.execute_async(prompt, prompt_id, extra_data, execute_outputs))2025-10-16T15:00:55.198526 - 
2025-10-16T15:00:55.198680 - 2025-10-16T15:00:55.198785 -   File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
2025-10-16T15:00:55.199070 - 2025-10-16T15:00:55.199147 -     2025-10-16T15:00:55.199274 - return runner.run(main)2025-10-16T15:00:55.199349 - 
2025-10-16T15:00:55.199456 - 2025-10-16T15:00:55.199528 -  2025-10-16T15:00:55.199681 -  2025-10-16T15:00:55.199816 -  2025-10-16T15:00:55.199893 -  2025-10-16T15:00:55.199987 -  2025-10-16T15:00:55.200055 -  2025-10-16T15:00:55.200146 -  2025-10-16T15:00:55.200238 -  2025-10-16T15:00:55.200311 -  2025-10-16T15:00:55.200410 -  2025-10-16T15:00:55.200501 -  2025-10-16T15:00:55.200600 - ^2025-10-16T15:00:55.200710 - ^2025-10-16T15:00:55.200800 - ^2025-10-16T15:00:55.200891 - ^2025-10-16T15:00:55.200983 - ^2025-10-16T15:00:55.201043 - ^2025-10-16T15:00:55.201137 - ^2025-10-16T15:00:55.201219 - ^2025-10-16T15:00:55.201318 - ^2025-10-16T15:00:55.201490 - ^2025-10-16T15:00:55.201644 - ^2025-10-16T15:00:55.201754 - ^2025-10-16T15:00:55.201816 - ^2025-10-16T15:00:55.201906 - ^2025-10-16T15:00:55.201979 - ^2025-10-16T15:00:55.202075 - ^2025-10-16T15:00:55.202149 - 
2025-10-16T15:00:55.202239 - 2025-10-16T15:00:55.202334 -   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
2025-10-16T15:00:55.202522 - 2025-10-16T15:00:55.202670 -     2025-10-16T15:00:55.202772 - return self._loop.run_until_complete(task)2025-10-16T15:00:55.202874 - 
2025-10-16T15:00:55.202987 - 2025-10-16T15:00:55.203060 -  2025-10-16T15:00:55.203155 -  2025-10-16T15:00:55.203227 -  2025-10-16T15:00:55.203328 -  2025-10-16T15:00:55.203411 -  2025-10-16T15:00:55.203515 -  2025-10-16T15:00:55.203671 -  2025-10-16T15:00:55.203776 -  2025-10-16T15:00:55.203849 -  2025-10-16T15:00:55.203940 -  2025-10-16T15:00:55.204007 -  2025-10-16T15:00:55.204097 - ^2025-10-16T15:00:55.204188 - ^2025-10-16T15:00:55.204276 - ^2025-10-16T15:00:55.204382 - ^2025-10-16T15:00:55.204471 - ^2025-10-16T15:00:55.204533 - ^2025-10-16T15:00:55.204684 - ^2025-10-16T15:00:55.204785 - ^2025-10-16T15:00:55.204878 - ^2025-10-16T15:00:55.204978 - ^2025-10-16T15:00:55.205080 - ^2025-10-16T15:00:55.205169 - ^2025-10-16T15:00:55.205229 - ^2025-10-16T15:00:55.205359 - ^2025-10-16T15:00:55.205448 - ^2025-10-16T15:00:55.205537 - ^2025-10-16T15:00:55.205611 - ^2025-10-16T15:00:55.205718 - ^2025-10-16T15:00:55.205806 - ^2025-10-16T15:00:55.205866 - ^2025-10-16T15:00:55.205960 - ^2025-10-16T15:00:55.206042 - ^2025-10-16T15:00:55.206148 - ^2025-10-16T15:00:55.206303 - ^2025-10-16T15:00:55.206435 - ^2025-10-16T15:00:55.206525 - ^2025-10-16T15:00:55.206648 - ^2025-10-16T15:00:55.206725 - ^2025-10-16T15:00:55.206818 - ^2025-10-16T15:00:55.206918 - ^2025-10-16T15:00:55.206991 - ^2025-10-16T15:00:55.207122 - ^2025-10-16T15:00:55.207184 - ^2025-10-16T15:00:55.207312 - ^2025-10-16T15:00:55.207407 - ^2025-10-16T15:00:55.207536 - 
2025-10-16T15:00:55.207613 - 2025-10-16T15:00:55.207712 -   File "/usr/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
2025-10-16T15:00:55.208176 - 2025-10-16T15:00:55.208256 -     2025-10-16T15:00:55.208357 - return future.result()2025-10-16T15:00:55.208460 - 
2025-10-16T15:00:55.208652 - 2025-10-16T15:00:55.208728 -  2025-10-16T15:00:55.208858 -  2025-10-16T15:00:55.208998 -  2025-10-16T15:00:55.209071 -  2025-10-16T15:00:55.209147 -  2025-10-16T15:00:55.209247 -  2025-10-16T15:00:55.209406 -  2025-10-16T15:00:55.209499 -  2025-10-16T15:00:55.209572 -  2025-10-16T15:00:55.209687 -  2025-10-16T15:00:55.209787 -  2025-10-16T15:00:55.209848 - ^2025-10-16T15:00:55.209944 - ^2025-10-16T15:00:55.210071 - ^2025-10-16T15:00:55.210136 - ^2025-10-16T15:00:55.210233 - ^2025-10-16T15:00:55.210362 - ^2025-10-16T15:00:55.210423 - ^2025-10-16T15:00:55.210550 - ^2025-10-16T15:00:55.210695 - ^2025-10-16T15:00:55.210760 - ^2025-10-16T15:00:55.210889 - ^2025-10-16T15:00:55.210949 - ^2025-10-16T15:00:55.211074 - ^2025-10-16T15:00:55.211163 - ^2025-10-16T15:00:55.211259 - ^2025-10-16T15:00:55.211347 - 
2025-10-16T15:00:55.211471 - 2025-10-16T15:00:55.211560 -   File "/home/lasse/ComfyUI/execution.py", line 722, in execute_async
2025-10-16T15:00:55.211863 - 2025-10-16T15:00:55.211942 -     2025-10-16T15:00:55.212055 - comfy.model_management.unload_all_models()2025-10-16T15:00:55.212158 - 
2025-10-16T15:00:55.212296 - 2025-10-16T15:00:55.212396 -   File "/home/lasse/ComfyUI/comfy/model_management.py", line 1399, in unload_all_models
2025-10-16T15:00:55.212996 - 2025-10-16T15:00:55.213065 -     2025-10-16T15:00:55.213159 - free_memory(1e30, get_torch_device())2025-10-16T15:00:55.213268 - 
2025-10-16T15:00:55.213380 - 2025-10-16T15:00:55.213508 -  2025-10-16T15:00:55.213651 -  2025-10-16T15:00:55.213750 -  2025-10-16T15:00:55.213881 -  2025-10-16T15:00:55.214008 -  2025-10-16T15:00:55.214068 -  2025-10-16T15:00:55.214199 -  2025-10-16T15:00:55.214306 -  2025-10-16T15:00:55.214439 -  2025-10-16T15:00:55.214545 -  2025-10-16T15:00:55.214697 -  2025-10-16T15:00:55.214810 -  2025-10-16T15:00:55.214901 -  2025-10-16T15:00:55.214993 -  2025-10-16T15:00:55.215083 -  2025-10-16T15:00:55.215144 -  2025-10-16T15:00:55.215271 -  2025-10-16T15:00:55.215366 -  2025-10-16T15:00:55.215493 -  2025-10-16T15:00:55.215597 -  2025-10-16T15:00:55.215699 -  2025-10-16T15:00:55.215789 -  2025-10-16T15:00:55.215875 - ^2025-10-16T15:00:55.215962 - ^2025-10-16T15:00:55.216066 - ^2025-10-16T15:00:55.216210 - ^2025-10-16T15:00:55.216316 - ^2025-10-16T15:00:55.216407 - ^2025-10-16T15:00:55.216506 - ^2025-10-16T15:00:55.216644 - ^2025-10-16T15:00:55.216709 - ^2025-10-16T15:00:55.216836 - ^2025-10-16T15:00:55.216927 - ^2025-10-16T15:00:55.216988 - ^2025-10-16T15:00:55.217112 - ^2025-10-16T15:00:55.217232 - ^2025-10-16T15:00:55.217346 - ^2025-10-16T15:00:55.217407 - ^2025-10-16T15:00:55.217498 - ^2025-10-16T15:00:55.217633 - ^2025-10-16T15:00:55.217729 - 
2025-10-16T15:00:55.217836 - 2025-10-16T15:00:55.217977 -   File "/home/lasse/ComfyUI/comfy/model_management.py", line 187, in get_torch_device
2025-10-16T15:00:55.218166 - 2025-10-16T15:00:55.218231 -     2025-10-16T15:00:55.218364 - return torch.device(torch.cuda.current_device())2025-10-16T15:00:55.218464 - 
2025-10-16T15:00:55.218576 - 2025-10-16T15:00:55.218698 -  2025-10-16T15:00:55.218796 -  2025-10-16T15:00:55.218869 -  2025-10-16T15:00:55.218998 -  2025-10-16T15:00:55.219059 -  2025-10-16T15:00:55.219184 -  2025-10-16T15:00:55.219246 -  2025-10-16T15:00:55.219371 -  2025-10-16T15:00:55.219467 -  2025-10-16T15:00:55.219563 -  2025-10-16T15:00:55.219639 -  2025-10-16T15:00:55.219733 -  2025-10-16T15:00:55.219803 -  2025-10-16T15:00:55.219876 -  2025-10-16T15:00:55.219987 -  2025-10-16T15:00:55.220118 -  2025-10-16T15:00:55.220223 -  2025-10-16T15:00:55.220289 -  2025-10-16T15:00:55.220378 -  2025-10-16T15:00:55.220438 -  2025-10-16T15:00:55.220567 -  2025-10-16T15:00:55.220676 -  2025-10-16T15:00:55.220766 -  2025-10-16T15:00:55.220855 -  2025-10-16T15:00:55.220950 - ^2025-10-16T15:00:55.221046 - ^2025-10-16T15:00:55.221172 - ^2025-10-16T15:00:55.221266 - ^2025-10-16T15:00:55.221393 - ^2025-10-16T15:00:55.221482 - ^2025-10-16T15:00:55.221543 - ^2025-10-16T15:00:55.221650 - ^2025-10-16T15:00:55.221785 - ^2025-10-16T15:00:55.221846 - ^2025-10-16T15:00:55.221971 - ^2025-10-16T15:00:55.222057 - ^2025-10-16T15:00:55.222123 - ^2025-10-16T15:00:55.222200 - ^2025-10-16T15:00:55.222296 - ^2025-10-16T15:00:55.222436 - ^2025-10-16T15:00:55.222538 - ^2025-10-16T15:00:55.222678 - ^2025-10-16T15:00:55.222745 - ^2025-10-16T15:00:55.222833 - ^2025-10-16T15:00:55.222921 - ^2025-10-16T15:00:55.222981 - ^2025-10-16T15:00:55.223106 - ^2025-10-16T15:00:55.223200 - ^2025-10-16T15:00:55.223327 - ^2025-10-16T15:00:55.223414 - ^2025-10-16T15:00:55.223511 - ^2025-10-16T15:00:55.223656 - 
2025-10-16T15:00:55.223733 - 2025-10-16T15:00:55.223863 -   File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/cuda/__init__.py", line 1072, in current_device
2025-10-16T15:00:55.224413 - 2025-10-16T15:00:55.224522 -     2025-10-16T15:00:55.224595 - return torch._C._cuda_getDevice()2025-10-16T15:00:55.224733 - 
2025-10-16T15:00:55.224819 - 2025-10-16T15:00:55.224897 -  2025-10-16T15:00:55.224997 -  2025-10-16T15:00:55.225137 -  2025-10-16T15:00:55.225239 -  2025-10-16T15:00:55.225307 -  2025-10-16T15:00:55.225403 -  2025-10-16T15:00:55.225504 -  2025-10-16T15:00:55.225576 -  2025-10-16T15:00:55.225699 -  2025-10-16T15:00:55.225795 -  2025-10-16T15:00:55.225885 -  2025-10-16T15:00:55.225957 - ^2025-10-16T15:00:55.226083 - ^2025-10-16T15:00:55.226179 - ^2025-10-16T15:00:55.226276 - ^2025-10-16T15:00:55.226371 - ^2025-10-16T15:00:55.226496 - ^2025-10-16T15:00:55.226634 - ^2025-10-16T15:00:55.226711 - ^2025-10-16T15:00:55.226837 - ^2025-10-16T15:00:55.226957 - ^2025-10-16T15:00:55.227047 - ^2025-10-16T15:00:55.227144 - ^2025-10-16T15:00:55.227240 - ^2025-10-16T15:00:55.227322 - ^2025-10-16T15:00:55.227503 - ^2025-10-16T15:00:55.227635 - ^2025-10-16T15:00:55.227736 - ^2025-10-16T15:00:55.227833 - ^2025-10-16T15:00:55.227960 - ^2025-10-16T15:00:55.228059 - ^2025-10-16T15:00:55.228155 - ^2025-10-16T15:00:55.228243 - ^2025-10-16T15:00:55.228336 - ^2025-10-16T15:00:55.228460 - ^2025-10-16T15:00:55.228554 - ^2025-10-16T15:00:55.228699 - ^2025-10-16T15:00:55.228804 - 
2025-10-16T15:00:55.228902 - 2025-10-16T15:00:55.229002 - torch2025-10-16T15:00:55.229121 - .2025-10-16T15:00:55.229247 - AcceleratorError2025-10-16T15:00:55.229308 - : 2025-10-16T15:00:55.229440 - HIP error: an illegal memory access was encountered
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
2025-10-16T15:00:55.229563 - 


## Attached Workflow
Please make sure that workflow does not contain any sensitive information such as API keys or passwords.

Workflow too large. Please manually upload the workflow from local file system.


## Additional Context
(Please add any additional context or steps to reproduce the error here)

Other

No response

druidican avatar Oct 16 '25 13:10 druidican

I have an identical issue (9070XT),any Rocm 7 build. Both latest amd proprietary drivers and the ubuntu base amd drivers.

Kubuntu 24.04, latest nightly comfyui, but it's been over the past few weeks of comfyui builds.

I'm just using the pytorch rocm 6.4 for now because it doesn't crash like this.

CSFFlame avatar Oct 17 '25 04:10 CSFFlame

How did you install rocm.. with the amdgpu-installer and with DKMS or in another way ?

druidican avatar Oct 17 '25 07:10 druidican

with the amdgpu-installer and with DKMS

yes,

and with pip3 using the pip3 install -U --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm7.0

CSFFlame avatar Oct 17 '25 08:10 CSFFlame

I have the same issue. Here is how I installed rocm. I've since reverted but these were my commands

From this guide

wget https://repo.radeon.com/amdgpu-install/7.0.2/ubuntu/noble/amdgpu-install_7.0.2.70002-1_all.deb sudo apt install ./amdgpu-install_7.0.2.70002-1_all.deb sudo apt update sudo apt install rocm apt list --installed | grep rocm #Confirming the version

#Then in my test venv source newenv/bin/activate pip list pip uninstall torch torchaudio torchvision pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.0

For some reason this broke my other VENV for comfyui running the stable with rocm 6.4. But idk if that's a comfyui issue or my system just getting confused with rocm7 installed while using pytorch stable rocm 6.4.

My error

  hidden_states_slice = torch.bmm(attn_probs.to(value.dtype), value)
/home/auser/git/comfy/ComfyUI/comfy/ldm/modules/sub_quadratic_attention.py:180: UserWarning: HIP warning: an illegal memory access was encountered (Triggered internally at /pytorch/aten/src/ATen/hip/impl/HIPGuardImplMasqueradingAsCUDA.h:83.)
  hidden_states_slice = torch.bmm(attn_probs.to(value.dtype), value)
  0%|                                                                                                                                                                                                                      | 0/2 [00:06<?, ?it/s]
!!! Exception during processing !!! HIP error: an illegal memory access was encountered
Search for `hipErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__HIPRT__TYPES.html for more information.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "/home/auser/git/comfy/ComfyUI/execution.py", line 496, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/execution.py", line 315, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/execution.py", line 289, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "/home/auser/git/comfy/ComfyUI/execution.py", line 277, in process_inputs
    result = f(**inputs)
             ^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/nodes.py", line 1559, in sample
    return common_ksampler(model, noise_seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_at_step, last_step=end_at_step, force_full_denoise=force_full_denoise)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/nodes.py", line 1492, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/sample.py", line 45, in sample
    samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/samplers.py", line 1161, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/samplers.py", line 1051, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/samplers.py", line 1036, in sample
    output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/samplers.py", line 1004, in outer_sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/samplers.py", line 987, in inner_sample
    samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/samplers.py", line 759, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/newenv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 122, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/k_diffusion/sampling.py", line 199, in sample_euler
    denoised = model(x, sigma_hat * s_in, **extra_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/samplers.py", line 408, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/samplers.py", line 960, in __call__
    return self.outer_predict_noise(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/samplers.py", line 967, in outer_predict_noise
    ).execute(x, timestep, model_options, seed)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/samplers.py", line 970, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/samplers.py", line 388, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/samplers.py", line 206, in calc_cond_batch
    return _calc_cond_batch_outer(model, conds, x_in, timestep, model_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/samplers.py", line 214, in _calc_cond_batch_outer
    return executor.execute(model, conds, x_in, timestep, model_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/samplers.py", line 333, in _calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/model_base.py", line 160, in apply_model
    return comfy.patcher_extension.WrapperExecutor.new_class_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/model_base.py", line 199, in _apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/newenv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1780, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/newenv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1791, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/ldm/wan/model.py", line 614, in forward
    return comfy.patcher_extension.WrapperExecutor.new_class_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/ldm/wan/model.py", line 634, in _forward
    return self.forward_orig(x, timestep, context, clip_fea=clip_fea, freqs=freqs, transformer_options=transformer_options, **kwargs)[:, :, :t, :h, :w]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/ldm/wan/model.py", line 579, in forward_orig
    x = block(x, e=e0, freqs=freqs, context=context, context_img_len=context_img_len, transformer_options=transformer_options)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/newenv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1780, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/newenv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1791, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/ldm/wan/model.py", line 235, in forward
    y = self.self_attn(
        ^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/newenv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1780, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/newenv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1791, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/ldm/wan/model.py", line 81, in forward
    x = optimized_attention(
        ^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/ldm/modules/attention.py", line 130, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/ldm/modules/attention.py", line 257, in attention_sub_quad
    hidden_states = efficient_dot_product_attention(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/ldm/modules/sub_quadratic_attention.py", line 268, in efficient_dot_product_attention
    compute_query_chunk_attn(
  File "/home/auser/git/comfy/ComfyUI/comfy/ldm/modules/sub_quadratic_attention.py", line 180, in _get_attention_scores_no_kv_chunking
    hidden_states_slice = torch.bmm(attn_probs.to(value.dtype), value)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.AcceleratorError: HIP error: an illegal memory access was encountered
Search for `hipErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__HIPRT__TYPES.html for more information.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.


Exception in thread Thread-2 (prompt_worker):
Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/home/auser/git/comfy/ComfyUI/main.py", line 195, in prompt_worker
    e.execute(item[2], prompt_id, item[3], item[4])
  File "/home/auser/git/comfy/ComfyUI/execution.py", line 649, in execute
    asyncio.run(self.execute_async(prompt, prompt_id, extra_data, execute_outputs))
  File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/execution.py", line 722, in execute_async
    comfy.model_management.unload_all_models()
  File "/home/auser/git/comfy/ComfyUI/comfy/model_management.py", line 1402, in unload_all_models
    free_memory(1e30, get_torch_device())
                      ^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/comfy/model_management.py", line 187, in get_torch_device
    return torch.device(torch.cuda.current_device())
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/auser/git/comfy/ComfyUI/newenv/lib/python3.12/site-packages/torch/cuda/__init__.py", line 1080, in current_device
    return torch._C._cuda_getDevice()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.AcceleratorError: HIP error: an illegal memory access was encountered
Search for `hipErrorIllegalAddress' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__HIPRT__TYPES.html for more information.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

Side note

The output says

set AMD_SERIALIZE_KERNEL=3 But setting that gets
UserWarning: Ignoring invalid value for boolean flag AMD_SERIALIZE_KERNEL: 3valid values are 0 or 1.

I set these env variables as a part of my troubleshooting.

export HIP_VISIBLE_DEVICES=0
export HIP_LAUNCH_BLOCKING=1
export AMD_SERIALIZE_KERNEL=1
export TORCH_BLAS_PREFER_HIPBLASLT=0

Env details

  Name:                    AMD Ryzen 9 7950X3D 16-Core Processor
  Marketing Name:          AMD Ryzen 9 7950X3D 16-Core Processor
  Vendor Name:             CPU                                
  Name:                    gfx1201                            
  Marketing Name:          AMD Radeon RX 9070 XT              
  Vendor Name:             AMD                                
      Name:                    amdgcn-amd-amdhsa--gfx1201         
      Name:                    amdgcn-amd-amdhsa--gfx12-generic   
torch 2.10.0.dev20251016+rocm7.0
HIP runtime: 7.0.51831-a3e329ad8
ROCm detected: True
Device: AMD Radeon RX 9070 XT

I also double checked I have update comfyui to the latest

comfyui-embedded-docs      0.3.0
comfyui_frontend_package   1.28.7
comfyui_workflow_templates 0.1.95

I wouldn't be surprised if this also was a pytorch/rocm issue. But I didn't see any open issues on those repo's and I did see this one. Thank's fellas. I've since reverted back to 6.4 but I'm happy to do another upgrade test. I suppose my life would be a bit easier if I used docker.

crosson avatar Oct 18 '25 01:10 crosson

this seems to be a growing issue with 9000 series i see a lot of people complaining about rocm / python instability, but no resolution a few says it works, but none of them have so far been willing to describe how they have done it (so I considder it unreliable for now)

druidican avatar Oct 18 '25 11:10 druidican

Same issue on the 7900GRE (gfx1100) with ROCm 7.x. Tried all the ROCm parameters in the book, nothing works.

x5nder avatar Oct 18 '25 14:10 x5nder

Same here - 9070XT with ROCm 7.x. It's not happening every time, but frequently enough to be an interrupt to the workflow. It's always in the KSampler node for me.

System information: https://termbin.com/8e28 Output: https://gist.github.com/Nihlus/59edf3a7fc5ebb21c6bd5e243705b448

Nihlus avatar Oct 26 '25 14:10 Nihlus

Can you try to install 7.0.2 official wheels from here: https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/docs/install/installrad/native_linux/install-pytorch.html#install-pytorch-via-pip

slojosic-amd avatar Oct 30 '25 13:10 slojosic-amd

Can you try to install 7.0.2 official wheels from here: https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/docs/install/installrad/native_linux/install-pytorch.html#install-pytorch-via-pip

Installing torch from there (rocm-7.0.2) does not fix the issue for me

Tureti avatar Oct 30 '25 15:10 Tureti

Same for me... neither reinstall, upgrade or any of the wheels fixes it for me same error every time

druidican avatar Oct 30 '25 16:10 druidican

Using Distorch2 MultiGPU samples solved all my problems.

x5nder avatar Oct 30 '25 16:10 x5nder

How did you set it up ?? the Distorch2 MultiGPU samples

cause i still get:

untered HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3 Compile with TORCH_USE_HIP_DSA to enable device-side assertions.

Traceback (most recent call last): File "/home/lasse/ComfyUI/execution.py", line 499, in execute output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/execution.py", line 316, in get_output_data return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/custom_nodes/ComfyUI-Lora-Manager/py/metadata_collector/metadata_hook.py", line 165, in async_map_node_over_list_with_metadata results = await original_map_node_over_list( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/execution.py", line 290, in _async_map_node_over_list await process_inputs(input_dict, i) File "/home/lasse/ComfyUI/execution.py", line 278, in process_inputs result = f(**inputs) ^^^^^^^^^^^ File "/home/lasse/ComfyUI/nodes.py", line 1525, in sample return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/nodes.py", line 1492, in common_ksampler samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/sample.py", line 60, in sample samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 1163, in sample return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 1053, in sample return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 1035, in sample output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 997, in outer_sample output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 980, in inner_sample samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 752, in sample samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/k_diffusion/sampling.py", line 800, in sample_dpmpp_2m if old_denoised is None or sigmas[i + 1] == 0: ^^^^^^^^^^^^^^^^^^ torch.AcceleratorError: HIP error: an illegal memory access was encountered HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3 Compile with TORCH_USE_HIP_DSA to enable device-side assertions.

Prompt executed in 202.07 seconds Exception in thread Thread-4 (prompt_worker): Traceback (most recent call last): File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner self.run() File "/usr/lib/python3.12/threading.py", line 1010, in run self._target(*self._args, **self._kwargs) File "/home/lasse/ComfyUI/main.py", line 240, in prompt_worker comfy.model_management.soft_empty_cache() File "/home/lasse/ComfyUI/comfy/model_management.py", line 1432, in soft_empty_cache torch.cuda.empty_cache() File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/cuda/memory.py", line 224, in empty_cache torch._C._cuda_emptyCache() torch.AcceleratorError: HIP error: an illegal memory access was encountered HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3 Compile with TORCH_USE_HIP_DSA to enable device-side assertions.

druidican avatar Oct 30 '25 17:10 druidican

I run comfyui with the following batch file:

set PYTORCH_HIP_ALLOC_CONF=max_split_size_mb:512
set PYTORCH_ALLOC_CONF=max_split_size_mb:512
set HSA_OVERRIDE_GFX_VERSION=11.0.0
set PYTORCH_HIP_FORCE_SHUTDOWN=1
cmd /c "C:\ComfyUI\venv\Scripts\activate.bat && cd C:\ComfyUI && python main.py --use-pytorch-cross-attention --disable-smart-memory --listen"

Then regardless of whether I run Qwen, WAN, or Flux, I use this loader with 'virtual_vram_gb' set to around 80% of the file size of the model. Example:

Image

So far, every workflow I throw at it works flawlessly.

x5nder avatar Oct 30 '25 17:10 x5nder

Thanks for input :) but sadly still failes for me. same OOM

druidican avatar Oct 30 '25 18:10 druidican

Still happening with updated rocm (https://repo.radeon.com/rocm/apt/7.1/) and torch (https://repo.radeon.com/rocm/manylinux/rocm-rel-7.1/)

On the bright side TorchCompileModel is now also broken and only generates black images (if illegal memory access doesn't happen first)

Tureti avatar Oct 30 '25 20:10 Tureti

I have now tried to upgrade and run with the latest 7.1 just released.. I can make 1 picture, if im lucky, but all subsequent pictures fail with the same oom almost at once.

druidican avatar Oct 31 '25 05:10 druidican

I have just tried to reinstall Ubuntu 24.04 completly, made a apt get update && upgrade -y reboot then installed rocm with the following commands:

sudo apt install ./amdgpu-install_7.1.70100-1_all.deb sudo apt update sudo apt install python3-setuptools python3-wheel sudo usermod -a -G render,video $LOGNAME

reboot sudo apt install -y rocm-opencl-runtime sudo apt purge -y rocminfo || true sudo amdgpu-install -y --usecase=rocm,graphics,hiplibsdk --no-dkms

sudo amdgpu-install -y --usecase=graphics,hiplibsdk,rocm,mllib --no-dkms sudo apt install -y python3-venv git python3-setuptools python3-wheel
graphicsmagick-imagemagick-compat llvm clang cmake gcc g++ ninja-build radeontop
libamd-comgr2 libhsa-runtime64-1 librccl1 librocalution0 librocblas0 librocfft0
librocm-smi64-1 librocsolver0 librocsparse0 rocm-device-libs-17 rocm-smi hipcc
libhiprand1 libhiprtc-builtins5

export PATH=$PATH:/opt/rocm/bin export LD_LIBRARY_PATH=/opt/rocm/lib sudo tee /etc/ld.so.conf.d/rocm.conf <<EOF /opt/rocm/lib /opt/rocm/lib64 EOF sudo ldconfig

reboot

git clone https://github.com/comfyanonymous/ComfyUI cd ComfyUI python3 -m venv .venv source .venv/bin/activate pip install --upgrade pip wheel setuptools

echo "📥 Downloading ROCm PyTorch wheels..." pip install -r requirements.txt wget https://repo.radeon.com/rocm/manylinux/rocm-rel-7.1/torch-2.8.0%2Brocm7.1.0.lw.git7a520360-cp312-cp312-linux_x86_64.whl wget https://repo.radeon.com/rocm/manylinux/rocm-rel-7.1/torchvision-0.23.0%2Brocm7.1.0.git824e8c87-cp312-cp312-linux_x86_64.whl wget https://repo.radeon.com/rocm/manylinux/rocm-rel-7.1/triton-3.4.0%2Brocm7.1.0.gitf9e5bf54-cp312-cp312-linux_x86_64.whl wget https://repo.radeon.com/rocm/manylinux/rocm-rel-7.1/torchaudio-2.8.0%2Brocm7.1.0.git6e1c7fe9-cp312-cp312-linux_x86_64.whl pip3 uninstall torch torchvision triton torchaudio pip3 install torch-2.8.0+rocm7.1.0.lw.git7a520360-cp312-cp312-linux_x86_64.whl torchvision-0.23.0+rocm7.1.0.git824e8c87-cp312-cp312-linux_x86_64.whl torchaudio-2.8.0+rocm7.1.0.git6e1c7fe9-cp312-cp312-linux_x86_64.whl triton-3.4.0+rocm7.1.0.gitf9e5bf54-cp312-cp312-linux_x86_64.whl pip install matplotlib pandas simpleeval pip install comfyui-frontend-package --upgrade

echo "🧩 Installing ComfyUI extensions..." cd custom_nodes git clone -b AMD https://github.com/crystian/ComfyUI-Crystools.git && cd ComfyUI-Crystools && pip install -r requirements.txt && cd .. git clone https://github.com/ltdrdata/ComfyUI-Manager comfyui-manager && cd comfyui-manager && pip install -r requirements.txt && cd .. pip install diffusers git clone https://github.com/pnikolic-amd/ComfyUI_MIGraphX.git && cd ComfyUI_MIGraphX && pip install -r requirements.txt && cd .. git clone https://github.com/ltdrdata/comfyui-unsafe-torch git clone https://github.com/ltdrdata/ComfyUI-Impact-Pack comfyui-impact-pack && cd comfyui-impact-pack && pip install -r requirements.txt && cd .. git clone https://github.com/ltdrdata/ComfyUI-Impact-Subpack && cd ComfyUI-Impact-Subpack && pip install -r requirements.txt && cd .. git clone https://github.com/chengzeyi/Comfy-WaveSpeed.git git clone https://github.com/willmiao/ComfyUI-Lora-Manager.git cd ComfyUI-Lora-Manager pip install -r requirements.txt cd .. cd ..

I then started ComfyUI first with the normal python main.py.

OOM right out of the bat.

I then used the following script: #!/bin/bash source .venv/bin/activate

=== ROCm paths ===

export ROCM_PATH="/opt/rocm" export HIP_PATH="$ROCM_PATH" export PATH="$ROCM_PATH/bin:$PATH" export LD_LIBRARY_PATH="$ROCM_PATH/lib:$ROCM_PATH/lib64:$LD_LIBRARY_PATH" export PYTHONPATH="$ROCM_PATH/lib:$ROCM_PATH/lib64:$PYTHONPATH" export HIP_VISIBLE_DEVICES=0 export ROCM_VISIBLE_DEVICES=0

=== GPU targeting ===

export HCC_AMDGPU_TARGET="gfx1201" # Change for your GPU export PYTORCH_ROCM_ARCH="gfx1201" # e.g., gfx1030 for RX 6800/6900

=== Memory allocator tuning ===

export PYTORCH_HIP_ALLOC_CONF="garbage_collection_threshold:0.6,max_split_size_mb:6144"

=== Precision and performance ===

export TORCH_BLAS_PREFER_HIPBLASLT=1 export TORCHINDUCTOR_MAX_AUTOTUNE_GEMM_BACKENDS="CK,TRITON,ROCBLAS" export TORCHINDUCTOR_MAX_AUTOTUNE_GEMM_SEARCH_SPACE="BEST" export TORCHINDUCTOR_FORCE_FALLBACK=1

=== Flash Attention ===

export FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" export FLASH_ATTENTION_BACKEND="flash_attn_triton_amd" export FLASH_ATTENTION_TRITON_AMD_SEQ_LEN=4096 export USE_CK=ON export TRANSFORMERS_USE_FLASH_ATTENTION=1 export TRITON_USE_ROCM=ON export TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1

=== CPU threading ===

export OMP_NUM_THREADS=8 export MKL_NUM_THREADS=8 export NUMEXPR_NUM_THREADS=8

=== Experimental ROCm flags ===

export HSA_ENABLE_ASYNC_COPY=0 export HSA_ENABLE_SDMA=1 export MIOPEN_FIND_MODE=2 export MIOPEN_ENABLE_CACHE=1

=== MIOpen cache ===

export MIOPEN_USER_DB_PATH="$HOME/.config/miopen" export MIOPEN_CUSTOM_CACHE_DIR="$HOME/.config/miopen"

=== Launch ComfyUI ===

python3 main.py --listen 0.0.0.0 --output-directory "$HOME/ComfyUI_Output" --normalvram --reserve-vram 2 --use-quad-cross-attention --fast

it now runs the first Ksampler. but when going to upscale, it crashes the entire computer..

So I am now reverting to 6.4.4... that is stable at least.. but please see if you can fix the instability, it prevents me from upgrating to newer versions

druidican avatar Oct 31 '25 08:10 druidican

@druidican please try to use --use-pytorch-cross-attention instead of --use-quad-cross-attention Also, I noticed that reverting 2.73 to 1.0 in https://github.com/comfyanonymous/ComfyUI/blob/master/comfy/sd.py#L283 reduces frequency of this issue during my experiments

slojosic-amd avatar Oct 31 '25 10:10 slojosic-amd

Also, since https://github.com/comfyanonymous/ComfyUI/pull/10302 has been merged, MIOpen will not be used at all but according to https://github.com/comfyanonymous/ComfyUI/issues/10460 we can try to revert this change and enable MIOpen again but please make sure to set MIOPEN_FIND_MODE=2 before running ComfyUI. Also, make sure to delete ~/.cache/miopen and ~/.config/miopen directories before starting ComfyUI. This is very good explanation how MIOpen works and why first run takes to much time: https://github.com/comfyanonymous/ComfyUI/pull/10302#issuecomment-3425750147 (because MIOpen needs to benchmark all solutions before caching the best one and MIOPEN_FIND_MODE=2 should speed up things a little bit)

slojosic-amd avatar Oct 31 '25 11:10 slojosic-amd

@slojosic-amd I have tried as you surgested.. and I get the following:

!!! Exception during processing !!! HIP error: an illegal memory access was encountered HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3 Compile with TORCH_USE_HIP_DSA to enable device-side assertions.

Traceback (most recent call last): File "/home/lasse/ComfyUI/execution.py", line 498, in execute output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/execution.py", line 316, in get_output_data return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/custom_nodes/ComfyUI-Lora-Manager/py/metadata_collector/metadata_hook.py", line 165, in async_map_node_over_list_with_metadata results = await original_map_node_over_list( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/execution.py", line 290, in _async_map_node_over_list await process_inputs(input_dict, i) File "/home/lasse/ComfyUI/execution.py", line 278, in process_inputs result = f(**inputs) ^^^^^^^^^^^ File "/home/lasse/ComfyUI/custom_nodes/comfyui-impact-pack/modules/impact/impact_pack.py", line 876, in doit enhanced_img, cropped_enhanced, cropped_enhanced_alpha, mask, cnet_pil_list = FaceDetailer.enhance_face( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/custom_nodes/comfyui-impact-pack/modules/impact/impact_pack.py", line 830, in enhance_face DetailerForEach.do_detail(image, segs, model, clip, vae, guide_size, guide_size_for_bbox, max_size, seed, steps, cfg, File "/home/lasse/ComfyUI/custom_nodes/comfyui-impact-pack/modules/impact/impact_pack.py", line 362, in do_detail enhanced_image, cnet_pils = core.enhance_detail(cropped_image, model, clip, vae, guide_size, guide_size_for_bbox, max_size, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/custom_nodes/comfyui-impact-pack/modules/impact/core.py", line 383, in enhance_detail refined_latent = impact_sampling.ksampler_wrapper(model2, seed2, steps2, cfg2, sampler_name2, scheduler2, positive2, negative2, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/custom_nodes/comfyui-impact-pack/modules/impact/impact_sampling.py", line 209, in ksampler_wrapper refined_latent = separated_sample(model, True, seed, advanced_steps, cfg, sampler_name, scheduler, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/custom_nodes/comfyui-impact-pack/modules/impact/impact_sampling.py", line 182, in separated_sample res = sample_with_custom_noise(model, add_noise, seed, cfg, positive, negative, impact_sampler, sigmas, latent_image, noise=noise, callback=callback) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/custom_nodes/comfyui-impact-pack/modules/impact/impact_sampling.py", line 126, in sample_with_custom_noise samples = comfy.sample.sample_custom(model, noise, cfg, sampler, sigmas, positive, negative, latent_image, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/sample.py", line 65, in sample_custom samples = comfy.samplers.sample(model, noise, positive, negative, cfg, model.load_device, sampler, sigmas, model_options=model.model_options, latent_image=latent_image, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 1053, in sample return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 1035, in sample output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 997, in outer_sample output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 980, in inner_sample samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 752, in sample samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/k_diffusion/sampling.py", line 800, in sample_dpmpp_2m if old_denoised is None or sigmas[i + 1] == 0: ^^^^^^^^^^^^^^^^^^ torch.AcceleratorError: HIP error: an illegal memory access was encountered HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3 Compile with TORCH_USE_HIP_DSA to enable device-side assertions.

Prompt executed in 96.52 seconds Exception in thread Thread-4 (prompt_worker): Traceback (most recent call last): File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner self.run() File "/usr/lib/python3.12/threading.py", line 1010, in run self._target(*self._args, **self._kwargs) File "/home/lasse/ComfyUI/main.py", line 233, in prompt_worker comfy.model_management.soft_empty_cache() File "/home/lasse/ComfyUI/comfy/model_management.py", line 1400, in soft_empty_cache torch.cuda.empty_cache() File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/cuda/memory.py", line 224, in empty_cache torch._C._cuda_emptyCache() torch.AcceleratorError: HIP error: an illegal memory access was encountered HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3 Compile with TORCH_USE_HIP_DSA to enable device-side assertions.

druidican avatar Oct 31 '25 15:10 druidican

============================== ComfyUI Diagnostics Report - 20251031_173935

SYSTEM INFO

OS and release: PRETTY_NAME="Ubuntu 24.04.3 LTS" NAME="Ubuntu" VERSION_ID="24.04" VERSION="24.04.3 LTS (Noble Numbat)" VERSION_CODENAME=noble ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=noble LOGO=ubuntu-logo

Kernel: Linux Odin 6.14.0-34-generic #34~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Sep 23 15:35:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Uptime: 17:39:35 up 30 min, 1 user, load average: 1,11, 1,96, 1,76

GPU / ROCm INFO

rocm-smi output:

========================================= ROCm System Management Interface ========================================= =================================================== Concise Info =================================================== Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
[3m (DID, GUID) (Edge) (Avg) (Mem, Compute, ID) [0m

0 1 0x7550, 21786 35.0°C 19.0W N/A, N/A, 0 1348Mhz 96Mhz 14.9% auto 280.0W 48% 7%

=============================================== End of ROCm SMI Log ================================================

rocminfo (first 50 lines): [37mROCk module is loaded[0m

HSA System Attributes

Runtime Version: 1.18 Runtime Ext Version: 1.14 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED XNACK enabled: NO DMAbuf Support: YES VMM Support: YES

==========
HSA Agents


Agent 1


Name: AMD Ryzen 9 5900X 12-Core Processor Uuid: CPU-XX
Marketing Name: AMD Ryzen 9 5900X 12-Core Processor Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 5622
BDFID: 0
Internal Node ID: 0
Compute Unit: 24
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None Pool Info:
rocminfo failed

lspci | grep VGA: 09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 48 [Radeon RX 9070/9070 XT/9070 GRE] (rev c0)

ENVIRONMENT VARIABLES (filtered)

Relevant ROCm / HIP / PyTorch vars: PATH=/home/lasse/ComfyUI/.venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/snap/bin

PYTHON & VENV INFO

Python binary: /home/lasse/ComfyUI/.venv/bin/python

Python version: Python 3.12.3

Active venv (if any): /home/lasse/ComfyUI/.venv

Python packages (key ones): sys.executable: /home/lasse/ComfyUI/.venv/bin/python sys.path[0:3]: ['', '/usr/lib/python312.zip', '/usr/lib/python3.12'] torch version: 2.8.0+rocm7.1.0.git7a520360 torch.version.hip: 7.1.25424-4179531dcd torch.version.cuda: None torch.cuda.is_available: True

Installed packages (torch, comfy, rocm, pillow, accelerate): torch: 2.8.0+rocm7.1.0.lw.git7a520360 pillow: 12.0.0 accelerate: 1.11.0

COMFYUI INSTALLATION INFO

ComfyUI directory exists: /home/lasse/ComfyUI Git remote and branch: origin https://github.com/comfyanonymous/ComfyUI (fetch) origin https://github.com/comfyanonymous/ComfyUI (push) master

Custom nodes: custom_nodes comfyui-manager js misc notebooks docs glob .github scripts node_db snapshots .cache pycache .git components comfyui-unsafe-torch pycache .git ComfyUI-Easy-Use resources styles ComfyUI-Easy-Use-Frontend .github web_version py tools pycache .git locales wildcards ComfyUI-GGUF .github tools pycache comfyui-impact-pack test example_workflows js troubleshooting modules .github custom_wildcards notebook pycache .git locales wildcards ComfyUI-Crystools nodes samples web general core docs .others .github server pycache .git ComfyUI-MagCache examples assets .github pycache .git ComfyUI_MIGraphX pycache .git workflows Comfy-WaveSpeed assets .github pycache .git workflows comfyui_ultimatesdupscale modules repositories .github pycache ComfyUI-UltimateSDUpscale-GGUF .github pycache ComfyUI-Impact-Subpack modules .github pycache .git comfyui-image-saver js .github pycache saver image-size-tools assets .github pycache pycache ComfyUI-Lora-Manager example_workflows web tests docs static templates .github scripts refs wiki-images py pycache .git locales civitai

SYSTEM LOGS (last 3 minutes)

Collecting journalctl output... okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:8 pasid:32779) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in process python3 pid 8440 thread python3 pid 8440) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in page starting at address 0x00007577645d7000 from client 10 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00841051 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MORE_FAULTS: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: WALKER_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: PERMISSION_FAULTS: 0x5 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MAPPING_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: RW: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:8 pasid:32779) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in process python3 pid 8440 thread python3 pid 8440) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in page starting at address 0x000075777fdd7000 from client 10 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00841051 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MORE_FAULTS: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: WALKER_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: PERMISSION_FAULTS: 0x5 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MAPPING_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: RW: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:8 pasid:32779) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in process python3 pid 8440 thread python3 pid 8440) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in page starting at address 0x0000757766dd7000 from client 10 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00841051 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MORE_FAULTS: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: WALKER_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: PERMISSION_FAULTS: 0x5 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MAPPING_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: RW: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:8 pasid:32779) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in process python3 pid 8440 thread python3 pid 8440) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in page starting at address 0x00007577735d7000 from client 10 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00841051 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MORE_FAULTS: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: WALKER_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: PERMISSION_FAULTS: 0x5 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MAPPING_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: RW: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:8 pasid:32779) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in process python3 pid 8440 thread python3 pid 8440) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in page starting at address 0x0000757761dd7000 from client 10 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00841051 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MORE_FAULTS: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: WALKER_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: PERMISSION_FAULTS: 0x5 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MAPPING_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: RW: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:8 pasid:32779) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in process python3 pid 8440 thread python3 pid 8440) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in page starting at address 0x000075775cdd7000 from client 10 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00841051 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MORE_FAULTS: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: WALKER_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: PERMISSION_FAULTS: 0x5 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MAPPING_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: RW: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:8 pasid:32779) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in process python3 pid 8440 thread python3 pid 8440) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in page starting at address 0x000075775f5d7000 from client 10 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x0084115B okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MORE_FAULTS: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: WALKER_ERROR: 0x5 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: PERMISSION_FAULTS: 0x5 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MAPPING_ERROR: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: RW: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:8 pasid:32779) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in process python3 pid 8440 thread python3 pid 8440) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in page starting at address 0x00007577695d7000 from client 10 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00841051 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MORE_FAULTS: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: WALKER_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: PERMISSION_FAULTS: 0x5 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MAPPING_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: RW: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:8 pasid:32779) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in process python3 pid 8440 thread python3 pid 8440) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in page starting at address 0x000075775a5d7000 from client 10 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00841051 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MORE_FAULTS: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: WALKER_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: PERMISSION_FAULTS: 0x5 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MAPPING_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: RW: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:8 pasid:32779) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in process python3 pid 8440 thread python3 pid 8440) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: in page starting at address 0x000075777fdeb000 from client 10 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00841051 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MORE_FAULTS: 0x1 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: WALKER_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: PERMISSION_FAULTS: 0x5 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: MAPPING_ERROR: 0x0 okt 31 17:38:59 Odin kernel: amdgpu 0000:09:00.0: amdgpu: RW: 0x1

KERNEL GPU LOGS (amdgpu-related)

Recent dmesg lines with 'amdgpu': No amdgpu messages in dmesg

END OF REPORT

druidican avatar Oct 31 '25 16:10 druidican

Please try: export HSA_ENABLE_SDMA=0

And see if the illegal memory accesses keep happening. I think there's a bug in the SDMA according to the journalctl messages, which lets GPU handle memory transfers from RAM. This will probably make diffusion slower but it might be more stable.

csirikak avatar Nov 03 '25 21:11 csirikak

KERNEL GPU LOGS (amdgpu-related)

Recent dmesg lines with 'amdgpu': No amdgpu messages in dmesg

Try to check manually. Has the same issue, DMESG will show something like:

[86454.437440] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:8 pasid:32770) [86454.437623] amdgpu 0000:03:00.0: amdgpu: in process python3 pid 688725 thread python3 pid 688725) [86454.437739] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000075c85d640000 from client 10 [86454.437856] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00841051 [86454.437971] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [86454.438086] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1 [86454.438200] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0 [86454.438311] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x5 [86454.438421] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0 [86454.438533] amdgpu 0000:03:00.0: amdgpu: RW: 0x1 [86454.438654] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:8 pasid:32770) [86454.438745] amdgpu 0000:03:00.0: amdgpu: in process python3 pid 688725 thread python3 pid 688725) [86454.438833] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000075c85d641000 from client 10 [86454.438921] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x0084115B [86454.439011] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [86454.439100] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1 [86454.439190] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x5 [86454.439278] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x5 [86454.439368] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1 [86454.439457] amdgpu 0000:03:00.0: amdgpu: RW: 0x1 [86454.439550] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:8 pasid:32770) [86454.439636] amdgpu 0000:03:00.0: amdgpu: in process python3 pid 688725 thread python3 pid 688725) [86454.439724] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000075c843e7c000 from client 10 [86454.439812] amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x0084115B [86454.439900] amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8) [86454.439989] amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x1 [86454.440078] amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x5 [86454.440165] amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x5 [86454.440252] amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x1 [86454.440337] amdgpu 0000:03:00.0: amdgpu: RW: 0x1 [86454.440430] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:8 pasid:32770) [86454.440519] amdgpu 0000:03:00.0: amdgpu: in process python3 pid 688725 thread python3 pid 688725) [86454.440605] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000075c843e7d000 from client 10 [86454.440699] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:8 pasid:32770) [86454.440785] amdgpu 0000:03:00.0: amdgpu: in process python3 pid 688725 thread python3 pid 688725) [86454.440872] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000075c85bf3c000 from client 10 [86454.440970] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:8 pasid:32770) [86454.441056] amdgpu 0000:03:00.0: amdgpu: in process python3 pid 688725 thread python3 pid 688725) [86454.441142] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000075c85bf3d000 from client 10 [86454.441236] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:8 pasid:32770) [86454.441322] amdgpu 0000:03:00.0: amdgpu: in process python3 pid 688725 thread python3 pid 688725) [86454.441408] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000075c85d643000 from client 10 [86454.441505] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:8 pasid:32770) [86454.441592] amdgpu 0000:03:00.0: amdgpu: in process python3 pid 688725 thread python3 pid 688725) [86454.441679] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000075c843e7f000 from client 10 [86454.441773] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:173 vmid:8 pasid:32770) [86454.441860] amdgpu 0000:03:00.0: amdgpu: in process python3 pid 688725 thread python3 pid 688725) [86454.441947] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000075c85bf3f000 from client 10 [86454.442041] amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32770) [86454.442128] amdgpu 0000:03:00.0: amdgpu: in process python3 pid 688725 thread python3 pid 688725) [86454.442214] amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x000075cccdfff000 from client 10 [86454.447455] workqueue: amdgpu_irq_handle_ih1 [amdgpu] hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND

int13h82 avatar Nov 03 '25 21:11 int13h82

Please try: export HSA_ENABLE_SDMA=0

And see if the illegal memory accesses keep happening. I think there's a bug in the SDMA according to the journalctl messages, which lets GPU handle memory transfers from RAM. This will probably make diffusion slower but it might be more stable.

Seems it is at least getting better. RX9070/Rocm7.1/Pytorch Rocm 7 nightly from pytorch.org With Qwen-image I was getting this error after 2-3 generations, and now had 6 in a row just fine. Although, had some other exported variables for test, let me try a fresh console and leave it with 'HSA_ENABLE_SDMA=0' only.

Performance impact is neglectable, about 0.05s/it. Assuming that ROCM 7 pytorch is WAY faster than 6.x here - I'd say it absolutely worth the hassle.

Upd: no, just "getting better". Still getting the error, just after a while. And my variables set was: declare -x HSA_ENABLE_SDMA="0" declare -x PYTORCH_ALLOC_CONF="max_split_size_mb:512" declare -x PYTORCH_HIP_ALLOC_CONF="max_split_size_mb:512" declare -x PYTORCH_HIP_FORCE_SHUTDOWN="1"

HSA_ENABLE_SDMA alone does not make the trick.

int13h82 avatar Nov 03 '25 21:11 int13h82

well... no dice.. now I cannot make a single picture.. it failes during first Ksampler.. every time.

Traceback (most recent call last): File "/home/lasse/ComfyUI/execution.py", line 510, in execute output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/execution.py", line 324, in get_output_data return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/custom_nodes/ComfyUI-Lora-Manager/py/metadata_collector/metadata_hook.py", line 165, in async_map_node_over_list_with_metadata results = await original_map_node_over_list( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/execution.py", line 298, in _async_map_node_over_list await process_inputs(input_dict, i) File "/home/lasse/ComfyUI/execution.py", line 286, in process_inputs result = f(**inputs) ^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy_extras/nodes_custom_sampler.py", line 835, in sample samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 1035, in sample output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 997, in outer_sample output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 980, in inner_sample samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 752, in sample samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/k_diffusion/sampling.py", line 199, in sample_euler denoised = model(x, sigma_hat * s_in, **extra_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 401, in call out = self.inner_model(x, sigma, model_options=model_options, seed=seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 953, in call return self.outer_predict_noise(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 960, in outer_predict_noise ).execute(x, timestep, model_options, seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 963, in predict_noise return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 381, in sampling_function out = calc_cond_batch(model, conds, x, timestep, model_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 206, in calc_cond_batch return _calc_cond_batch_outer(model, conds, x_in, timestep, model_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 214, in _calc_cond_batch_outer return executor.execute(model, conds, x_in, timestep, model_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/samplers.py", line 324, in calc_cond_batch output = model_options['model_function_wrapper'](model.apply_model, {"input": input_x, "timestep": timestep, "c": c, "cond_or_uncond": cond_or_uncond}).chunk(batch_chunks) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/custom_nodes/ComfyUI-MagCache/nodes.py", line 807, in unet_wrapper_function return model_function(input, timestep, **c) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/model_base.py", line 161, in apply_model return comfy.patcher_extension.WrapperExecutor.new_class_executor( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/model_base.py", line 203, in _apply_model model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/ldm/chroma/model.py", line 261, in forward return comfy.patcher_extension.WrapperExecutor.new_class_executor( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/comfy/ldm/chroma/model.py", line 284, in _forward out = self.forward_orig(img, img_ids, context, txt_ids, timestep, guidance, control, transformer_options, attn_mask=kwargs.get("attention_mask", None)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lasse/ComfyUI/custom_nodes/ComfyUI-MagCache/nodes.py", line 680, in magcache_chroma_forward self.residual_cache[self.cnt%2] = (img - ori_img).to(mm.unet_offload_device()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ torch.AcceleratorError: HIP error: an illegal memory access was encountered Compile with TORCH_USE_HIP_DSA to enable device-side assertions.

Prompt executed in 52.77 seconds Exception in thread Thread-6 (prompt_worker): Traceback (most recent call last): File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner self.run() File "/usr/lib/python3.12/threading.py", line 1010, in run self._target(*self._args, **self._kwargs) File "/home/lasse/ComfyUI/main.py", line 242, in prompt_worker comfy.model_management.soft_empty_cache() File "/home/lasse/ComfyUI/comfy/model_management.py", line 1432, in soft_empty_cache torch.cuda.empty_cache() File "/home/lasse/ComfyUI/.venv/lib/python3.12/site-packages/torch/cuda/memory.py", line 224, in empty_cache torch._C._cuda_emptyCache() torch.AcceleratorError: HIP error: an illegal memory access was encountered Compile with TORCH_USE_HIP_DSA to enable device-side assertions.

druidican avatar Nov 04 '25 04:11 druidican

Would it make any difference, or help anything, if I could make a description of my linux setup, including variables and the like ?? would it help anything ?

druidican avatar Nov 04 '25 09:11 druidican

nov 06 06:01:00 Odin systemd[1]: tmp-snap.rootfs_0VjBMz.mount: Deactivated successfully. nov 06 06:01:01 Odin kernel: audit: type=1400 audit(1762405261.015:153): apparmor="DENIED" operation="open" class="file" profile="snap.f> nov 06 06:01:01 Odin systemd[2257]: snap.firmware-updater.firmware-notifier.service: Consumed 1.374s CPU time. nov 06 06:01:01 Odin kernel: amdgpu 0000:09:00.0: amdgpu: 000000005abb2ae3 pin failed nov 06 06:01:01 Odin kernel: [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] ERROR Failed to pin framebuffer with error -12 nov 06 06:01:01 Odin gnome-shell[2562]: Page flip failed: drmModeAtomicCommit: Kan ikke tildele hukommelse nov 06 06:01:01 Odin kernel: workqueue: svm_range_restore_work [amdgpu] hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND nov 06 06:01:01 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:01 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:01 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:01 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:01 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:02 Odin kernel: amdgpu 0000:09:00.0: amdgpu: 0000000085653ad3 pin failed nov 06 06:01:02 Odin kernel: [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] ERROR Failed to pin framebuffer with error -12 nov 06 06:01:02 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:02 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:02 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:02 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:02 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:02 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:02 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:02 Odin gnome-shell[2562]: Page flip failed: drmModeAtomicCommit: Kan ikke tildele hukommelse nov 06 06:01:02 Odin gnome-shell[2562]: Page flip failed: drmModeAtomicCommit: Kan ikke tildele hukommelse nov 06 06:01:02 Odin kernel: amdgpu 0000:09:00.0: amdgpu: 0000000064409088 pin failed nov 06 06:01:02 Odin kernel: [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] ERROR Failed to pin framebuffer with error -12 nov 06 06:01:02 Odin kernel: workqueue: svm_range_restore_work [amdgpu] hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND nov 06 06:01:02 Odin xdg-desktop-por[3215]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin xdg-desktop-por[3253]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin gnome-shell[4791]: (EE) failed to read Wayland events: Broken pipe nov 06 06:01:02 Odin update-notifier[5080]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin gsd-wacom[2791]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin gsd-media-keys[2762]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin gsd-power[2764]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin brave[4519]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin gsd-color[2746]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin evolution-alarm[5374]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin gsd-keyboard[2760]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin evolution-alarm[2875]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin systemd[2257]: org.gnome.SettingsDaemon.Color.service: Main process exited, code=exited, status=1/FAILURE nov 06 06:01:02 Odin polkitd[999]: Unregistered Authentication Agent for unix-session:2 (system bus name :1.79, object path /org/freedes> nov 06 06:01:02 Odin gnome-terminal-[4452]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin gnome-software[2857]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin solaar[3070]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin systemd[2257]: [email protected]: Main process exited, code=killed, status=9/KILL nov 06 06:01:02 Odin systemd[2257]: org.gnome.SettingsDaemon.Keyboard.service: Main process exited, code=exited, status=1/FAILURE nov 06 06:01:02 Odin systemd[2257]: org.gnome.SettingsDaemon.MediaKeys.service: Main process exited, code=exited, status=1/FAILURE nov 06 06:01:02 Odin systemd[2257]: org.gnome.SettingsDaemon.Power.service: Main process exited, code=exited, status=1/FAILURE nov 06 06:01:02 Odin kernel: amdgpu 0000:09:00.0: amdgpu: VM memory stats for proc xdg-desktop-por(6223) task xdg-deskto:cs0(3215) is no> nov 06 06:01:02 Odin systemd[2257]: org.gnome.SettingsDaemon.Wacom.service: Main process exited, code=exited, status=1/FAILURE nov 06 06:01:02 Odin systemd[2257]: xdg-desktop-portal-gnome.service: Main process exited, code=exited, status=1/FAILURE nov 06 06:01:02 Odin systemd[2257]: xdg-desktop-portal-gnome.service: Failed with result 'exit-code'. nov 06 06:01:02 Odin systemd[2257]: xdg-desktop-portal-gnome.service: Consumed 1.189s CPU time. nov 06 06:01:02 Odin systemd[2257]: xdg-desktop-portal-gtk.service: Main process exited, code=exited, status=1/FAILURE nov 06 06:01:02 Odin systemd[2257]: xdg-desktop-portal-gtk.service: Failed with result 'exit-code'. nov 06 06:01:02 Odin systemd[2257]: gnome-terminal-server.service: Main process exited, code=exited, status=1/FAILURE lines 1-55...skipping... nov 06 06:01:00 Odin systemd[1]: tmp-snap.rootfs_0VjBMz.mount: Deactivated successfully. nov 06 06:01:01 Odin kernel: audit: type=1400 audit(1762405261.015:153): apparmor="DENIED" operation="open" class="file" profile="snap.f> nov 06 06:01:01 Odin systemd[2257]: snap.firmware-updater.firmware-notifier.service: Consumed 1.374s CPU time. nov 06 06:01:01 Odin kernel: amdgpu 0000:09:00.0: amdgpu: 000000005abb2ae3 pin failed nov 06 06:01:01 Odin kernel: [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] ERROR Failed to pin framebuffer with error -12 nov 06 06:01:01 Odin gnome-shell[2562]: Page flip failed: drmModeAtomicCommit: Kan ikke tildele hukommelse nov 06 06:01:01 Odin kernel: workqueue: svm_range_restore_work [amdgpu] hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND nov 06 06:01:01 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:01 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:01 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:01 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:01 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:02 Odin kernel: amdgpu 0000:09:00.0: amdgpu: 0000000085653ad3 pin failed nov 06 06:01:02 Odin kernel: [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] ERROR Failed to pin framebuffer with error -12 nov 06 06:01:02 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:02 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:02 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:02 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:02 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:02 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:02 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 06 06:01:02 Odin gnome-shell[2562]: Page flip failed: drmModeAtomicCommit: Kan ikke tildele hukommelse nov 06 06:01:02 Odin gnome-shell[2562]: Page flip failed: drmModeAtomicCommit: Kan ikke tildele hukommelse nov 06 06:01:02 Odin kernel: amdgpu 0000:09:00.0: amdgpu: 0000000064409088 pin failed nov 06 06:01:02 Odin kernel: [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] ERROR Failed to pin framebuffer with error -12 nov 06 06:01:02 Odin kernel: workqueue: svm_range_restore_work [amdgpu] hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND nov 06 06:01:02 Odin xdg-desktop-por[3215]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin xdg-desktop-por[3253]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin gnome-shell[4791]: (EE) failed to read Wayland events: Broken pipe nov 06 06:01:02 Odin update-notifier[5080]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin gsd-wacom[2791]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin gsd-media-keys[2762]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin gsd-power[2764]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin brave[4519]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin gsd-color[2746]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin evolution-alarm[5374]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin gsd-keyboard[2760]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin evolution-alarm[2875]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin systemd[2257]: org.gnome.SettingsDaemon.Color.service: Main process exited, code=exited, status=1/FAILURE nov 06 06:01:02 Odin polkitd[999]: Unregistered Authentication Agent for unix-session:2 (system bus name :1.79, object path /org/freedes> nov 06 06:01:02 Odin gnome-terminal-[4452]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin gnome-software[2857]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin solaar[3070]: Error reading events from display: Kanalen blev brudt nov 06 06:01:02 Odin systemd[2257]: [email protected]: Main process exited, code=killed, status=9/KILL nov 06 06:01:02 Odin systemd[2257]: org.gnome.SettingsDaemon.Keyboard.service: Main process exited, code=exited, status=1/FAILURE nov 06 06:01:02 Odin systemd[2257]: org.gnome.SettingsDaemon.MediaKeys.service: Main process exited, code=exited, status=1/FAILURE nov 06 06:01:02 Odin systemd[2257]: org.gnome.SettingsDaemon.Power.service: Main process exited, code=exited, status=1/FAILURE nov 06 06:01:02 Odin kernel: amdgpu 0000:09:00.0: amdgpu: VM memory stats for proc xdg-desktop-por(6223) task xdg-deskto:cs0(3215) is no> nov 06 06:01:02 Odin systemd[2257]: org.gnome.SettingsDaemon.Wacom.service: Main process exited, code=exited, status=1/FAILURE nov 06 06:01:02 Odin systemd[2257]: xdg-desktop-portal-gnome.service: Main process exited, code=exited, status=1/FAILURE nov 06 06:01:02 Odin systemd[2257]: xdg-desktop-portal-gnome.service: Failed with result 'exit-code'. nov 06 06:01:02 Odin systemd[2257]: xdg-desktop-portal-gnome.service: Consumed 1.189s CPU time. nov 06 06:01:02 Odin systemd[2257]: xdg-desktop-portal-gtk.service: Main process exited, code=exited, status=1/FAILURE nov 06 06:01:02 Odin systemd[2257]: xdg-desktop-portal-gtk.service: Failed with result 'exit-code'. nov 06 06:01:02 Odin systemd[2257]: gnome-terminal-server.service: Main process exited, code=exited, status=1/FAILURE nov 06 06:01:02 Odin systemd[2257]: gnome-terminal-server.service: Failed with result 'exit-code'.

druidican avatar Nov 06 '25 05:11 druidican

There's a workaround... I suspect it's a driver issue:

https://github.com/ROCm/TheRock/issues/1795#issuecomment-3519877539

CSFFlame avatar Nov 12 '25 04:11 CSFFlame

I have now tried the workaround.. I can now make 2 pictures, but the 3rd gives this error message: ov 12 20:07:55 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:07:55 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:07:55 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:07:55 Odin kernel: amdgpu 0000:09:00.0: amdgpu: 00000000ff287153 pin failed nov 12 20:07:55 Odin kernel: [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] ERROR Failed to pin framebuffer with error -12 nov 12 20:07:55 Odin gnome-shell[2586]: Page flip failed: drmModeAtomicCommit: Kan ikke tildele hukommelse nov 12 20:07:55 Odin kernel: amdgpu 0000:09:00.0: amdgpu: 00000000cddb2b91 pin failed nov 12 20:07:55 Odin kernel: [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] ERROR Failed to pin framebuffer with error -12 nov 12 20:07:55 Odin gnome-shell[2586]: Page flip failed: drmModeAtomicCommit: Kan ikke tildele hukommelse nov 12 20:07:56 Odin gnome-shell[2586]: Page flip failed: drmModeAtomicCommit: Kan ikke tildele hukommelse nov 12 20:07:56 Odin kernel: amdgpu 0000:09:00.0: amdgpu: 00000000ff287153 pin failed nov 12 20:07:56 Odin kernel: [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] ERROR Failed to pin framebuffer with error -12 nov 12 20:08:03 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:08:03 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:08:04 Odin kernel: amdgpu 0000:09:00.0: amdgpu: 00000000f5ac28ec pin failed nov 12 20:08:04 Odin kernel: [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] ERROR Failed to pin framebuffer with error -12 nov 12 20:08:04 Odin gnome-shell[2586]: Page flip failed: drmModeAtomicCommit: Kan ikke tildele hukommelse nov 12 20:08:10 Odin gnome-shell[2586]: Page flip failed: drmModeAtomicCommit: Kan ikke tildele hukommelse nov 12 20:08:10 Odin kernel: amdgpu 0000:09:00.0: amdgpu: 00000000ff287153 pin failed nov 12 20:08:10 Odin kernel: [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] ERROR Failed to pin framebuffer with error -12 nov 12 20:08:11 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:08:11 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:08:11 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:08:11 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:08:11 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:08:11 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:08:11 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:08:11 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:08:11 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:08:11 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:08:11 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:08:11 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:08:11 Odin kernel: amdgpu 0000:09:00.0: amdgpu: 00000000f5ac28ec pin failed nov 12 20:08:11 Odin kernel: [drm:amdgpu_dm_plane_helper_prepare_fb [amdgpu]] ERROR Failed to pin framebuffer with error -12 nov 12 20:08:11 Odin gnome-shell[2586]: Page flip failed: drmModeAtomicCommit: Kan ikke tildele hukommelse nov 12 20:08:13 Odin kernel: [drm:amdgpu_cs_ioctl [amdgpu]] ERROR Not enough memory for command submission! nov 12 20:09:24 Odin gnome-shell[2586]: GNOME Shell crashed with signal 11 nov 12 20:09:24 Odin gnome-shell[2586]: == Stack trace for context 0x5868327d4830 ==

druidican avatar Nov 12 '25 19:11 druidican

Correction.. after reinstalling ubuntu and making a clean install, removing most of my startupflags and applying the grub changes... the comfyui have become stable, bot slower.. but its a clear improvement

druidican avatar Nov 12 '25 21:11 druidican