ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Pinned memory causes error with GGUF model

Open nestflow opened this issue 1 month ago • 52 comments

Custom Node Testing

Expected Behavior

Works as pinned memory is not enabled.

Actual Behavior

The bug seems to happen when trying to unload the model. I'm using Wan 2.2 model with ComfyUI-GGUF node, and it breaks between two samplers. Maybe it's an issue of GGUF node, but I reported here since it only happens when enabling pinned memory. Also, the bug seems to only happen with a relatively high video resolution.

Steps to Reproduce

This is not the actual workflow I used since it includes too many unrelated custom nodes, but I can reproduce the bug with it: bugged_workflow.json.

Image

Debug Logs

- **Node ID:** 360
- **Node Type:** SamplerCustom
- **Exception Type:** torch.AcceleratorError
- **Exception Message:** CUDA error: invalid argument
Search for 'cudaErrorInvalidValue' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


  File "D:\Projects\ComfyUI\execution.py", line 510, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\execution.py", line 324, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\execution.py", line 298, in _async_map_node_over_list
    await process_inputs(input_dict, i)

  File "D:\Projects\ComfyUI\execution.py", line 286, in process_inputs
    result = f(**inputs)

  File "D:\Projects\ComfyUI\comfy_extras\nodes_custom_sampler.py", line 658, in sample
    samples = comfy.sample.sample_custom(model, noise, cfg, sampler, sigmas, positive, negative, latent_image, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise_seed)

  File "D:\Projects\ComfyUI\comfy\sample.py", line 65, in sample_custom
    samples = comfy.samplers.sample(model, noise, positive, negative, cfg, model.load_device, sampler, sigmas, model_options=model.model_options, latent_image=latent_image, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)

  File "D:\Projects\ComfyUI\comfy\samplers.py", line 1053, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\samplers.py", line 1035, in sample
    output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)

  File "D:\Projects\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\samplers.py", line 984, in outer_sample
    self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options)
                                                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\sampler_helpers.py", line 130, in prepare_sampling
    return executor.execute(model, noise_shape, conds, model_options=model_options)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\sampler_helpers.py", line 138, in _prepare_sampling
    comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required + inference_memory, minimum_memory_required=minimum_memory_required + inference_memory)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\model_management.py", line 697, in load_models_gpu
    loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights)
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\model_management.py", line 506, in model_load
    self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\model_management.py", line 535, in model_use_more_vram
    return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights)
           ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\model_patcher.py", line 919, in partially_load
    self.unpatch_model(self.offload_device, unpatch_weights=unpatch_weights)
    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\custom_nodes\ComfyUI-GGUF\nodes.py", line 77, in unpatch_model
    return super().unpatch_model(device_to=device_to, unpatch_weights=unpatch_weights)
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\model_patcher.py", line 832, in unpatch_model
    self.model.to(device_to)
    ~~~~~~~~~~~~~^^^^^^^^^^^

  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1371, in to
    return self._apply(convert)
           ~~~~~~~~~~~^^^^^^^^^

  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 930, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^

  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 930, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^

  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 930, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^

  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 957, in _apply
    param_applied = fn(param)

  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1357, in convert
    return t.to(
           ~~~~^
        device,
        ^^^^^^^
        dtype if t.is_floating_point() or t.is_complex() else None,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        non_blocking,
        ^^^^^^^^^^^^^
    )
    ^

  File "D:\Projects\ComfyUI\custom_nodes\ComfyUI-GGUF\ops.py", line 58, in to
    new = super().to(*args, **kwargs)

  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\_tensor.py", line 1654, in __torch_function__
    ret = func(*args, **kwargs)

Other

No response

nestflow avatar Nov 06 '25 05:11 nestflow

same on rtx 4060. Same in no gguf workflows

tester4488 avatar Nov 06 '25 05:11 tester4488

Have you tried updating the gguf node and comfyui?

comfyanonymous avatar Nov 06 '25 06:11 comfyanonymous

I can confirm they are both the latest.

Have you tried updating the gguf node and comfyui?

nestflow avatar Nov 06 '25 06:11 nestflow

@nestflow how much VRAM do you have? My guess is you have 24GB based on what happening, but these code paths do depend on how much VRAM you are running and the number will help be reproduce.

rattus128 avatar Nov 06 '25 10:11 rattus128

I cant reproduce this on Linux at all with the workflow attached and I put latent upscalers between the two samplers to test both up and down changes of the partial load. I trapped the exact code path in the trace and no reproducer.

Theres some suspicious code in the core around weight backups that doesnt look right, but ill take all of this to windows and see if I get the repro first.

rattus128 avatar Nov 06 '25 11:11 rattus128

@nestflow how much VRAM do you have? My guess is you have 24GB based on what happening, but these code paths do depend on how much VRAM you are running and the number will help be reproduce.

I have 16GB VRAM and use Windows.

nestflow avatar Nov 06 '25 15:11 nestflow

I've tried everything, but the error persists. I've now switched to fp8_scaled and, to be honest, I'm getting even better results with it and am grateful for the error message :-D

hanswurst232 avatar Nov 06 '25 20:11 hanswurst232

same here at Windows :( 16gb vram No GGUF workflow I am useing block swap and torchcompile

PopHorn avatar Nov 06 '25 20:11 PopHorn

Stop using block swap, it's completely useless with native models.

comfyanonymous avatar Nov 06 '25 20:11 comfyanonymous

I also have 16 GB ( rtx 5060 Ti, 64 GB RAM, Windows) and I run without problems unquantized fp16 Wan 2.2 I2V + 4 steps Loras and fp32 VAE (because I am an obsessive compulsive quality freak). One difference might be that I generally install almost every day nightly torch 2.10.0+cu130 (today is no exception) and the corresponding torchvision and torchaudio. I also run my own builds of xformers, flash attention 2.8.3 and sageattention using latest available git versions and CUDA 13.0.2 (I make new builds when there are updates to those projects maybe within a week time). And of course I always use latest Nvidia driver. For what is worth for the subject of this thread.

jovan2009 avatar Nov 06 '25 20:11 jovan2009

Stop using block swap, it's completely useless with native models.

Really? But I'd definitely go OOM for higher resolutions of video

The bug persists. The error comes when KSampler changes from high noise to low one. I've just loder the WF that worked perfectly yday

PopHorn avatar Nov 06 '25 21:11 PopHorn

Stop using block swap, it's completely useless with native models.

I've just used generic wan22 i2v wf from template lib

Image

PopHorn avatar Nov 06 '25 21:11 PopHorn

Same issue for me. This is a serious bug it would seem. Wan 2.2 WFs working fine with v0.3.67-27-g135fa49e (2025-11-01) until I ran update.bat yesterday and it started crashing when switching between high and low.

I restored my install from a backup and immediately start working again. Definately something in the last update causing the problem. Using Unet GGUF loader, all updated.

Windows Portable. RTX 5090. Torch 2.9 cu130.

slikvik55 avatar Nov 06 '25 22:11 slikvik55

I should add, I dont have pinned memory enabled, at least I don't think I do! Didn't even know it existed until now.

slikvik55 avatar Nov 06 '25 22:11 slikvik55

I should add, I dont have pinned memory enabled, at least I don't think I do! Didn't even know it existed until now.

This is because 1d69245981f9fb3861018613246042296d887dd3 has set pinned memory as the default. As said as a workaround, you can use --disable-pinned-memory to temporaily disable it, and it works for me. But yes, we need to fix it if making this as the default option.

nestflow avatar Nov 06 '25 22:11 nestflow

Stop using block swap, it's completely useless with native models.

I've just used generic wan22 i2v wf from template lib

Image

Thats a different error message. Can I get everything in show report?

rattus128 avatar Nov 06 '25 22:11 rattus128

This has a chance of fixing.

https://github.com/comfyanonymous/ComfyUI/pull/10672/files

Anyone still tracking this please feel free to give it a go. It has a very good chance of fixing for @PopHorn and a decent chance of helping @nestflow (based on differences in your messages).

If is still errors, paste everything even if it looks the same. This does have a chance of changing error messages and would give a lot of information.

rattus128 avatar Nov 07 '25 02:11 rattus128

This has a chance of fixing.

https://github.com/comfyanonymous/ComfyUI/pull/10672/files

Anyone still tracking this please feel free to give it a go. It has a very good chance of fixing for @PopHorn and a decent chance of helping @nestflow (based on differences in your messages).

If is still errors, paste everything even if it looks the same. This does have a chance of changing error messages and would give a lot of information.

Hi @rattus128 , thx for your help! I did some quick tests of the updated codes, and the error no longer happens in the native workflow I posted above. Though, my original workflow still errors, and I located the WanVideoNAG node from KJNode might be the reason. So I guess some compatibility problems still remain, and I will open an issue in KJNode for a more appropriate place for discussion and link this issue. Whatsoever, here are the error messages for you to check.

Error Details

  • Node ID: 360
  • Node Type: SamplerCustom
  • Exception Type: torch.AcceleratorError
  • Exception Message: CUDA error: invalid argument Search for 'cudaErrorInvalidValue' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Stack Trace

  File "D:\Projects\ComfyUI\execution.py", line 510, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\execution.py", line 324, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\execution.py", line 298, in _async_map_node_over_list
    await process_inputs(input_dict, i)

  File "D:\Projects\ComfyUI\execution.py", line 286, in process_inputs
    result = f(**inputs)

  File "D:\Projects\ComfyUI\comfy_extras\nodes_custom_sampler.py", line 658, in sample
    samples = comfy.sample.sample_custom(model, noise, cfg, sampler, sigmas, positive, negative, latent_image, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise_seed)

  File "D:\Projects\ComfyUI\comfy\sample.py", line 65, in sample_custom
    samples = comfy.samplers.sample(model, noise, positive, negative, cfg, model.load_device, sampler, sigmas, model_options=model.model_options, latent_image=latent_image, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)

  File "D:\Projects\ComfyUI\comfy\samplers.py", line 1053, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\samplers.py", line 1035, in sample
    output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)

  File "D:\Projects\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\samplers.py", line 984, in outer_sample
    self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options)
                                                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\sampler_helpers.py", line 130, in prepare_sampling
    return executor.execute(model, noise_shape, conds, model_options=model_options)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\sampler_helpers.py", line 138, in _prepare_sampling
    comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required + inference_memory, minimum_memory_required=minimum_memory_required + inference_memory)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\model_management.py", line 697, in load_models_gpu
    loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights)
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\model_management.py", line 506, in model_load
    self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\model_management.py", line 535, in model_use_more_vram
    return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights)
           ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\model_patcher.py", line 919, in partially_load
    self.unpatch_model(self.offload_device, unpatch_weights=unpatch_weights)
    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\custom_nodes\ComfyUI-GGUF\nodes.py", line 77, in unpatch_model
    return super().unpatch_model(device_to=device_to, unpatch_weights=unpatch_weights)
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "D:\Projects\ComfyUI\comfy\model_patcher.py", line 832, in unpatch_model
    self.model.to(device_to)
    ~~~~~~~~~~~~~^^^^^^^^^^^

  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1371, in to
    return self._apply(convert)
           ~~~~~~~~~~~^^^^^^^^^

  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 930, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^

  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 930, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^

  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 930, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^

  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 957, in _apply
    param_applied = fn(param)

  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1357, in convert
    return t.to(
           ~~~~^
        device,
        ^^^^^^^
        dtype if t.is_floating_point() or t.is_complex() else None,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        non_blocking,
        ^^^^^^^^^^^^^
    )
    ^

  File "D:\Projects\ComfyUI\custom_nodes\ComfyUI-GGUF\ops.py", line 58, in to
    new = super().to(*args, **kwargs)

  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\_tensor.py", line 1654, in __torch_function__
    ret = func(*args, **kwargs)

nestflow avatar Nov 07 '25 03:11 nestflow

I encountered this problem right after updating ComfyUI today.
CUDA error: out of memory Search for cudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

ucboys-cloud avatar Nov 07 '25 09:11 ucboys-cloud

I encountered this problem right after updating ComfyUI today. CUDA error: out of memory Search for cudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

Can you paste the entire log and describe your workflow and hardware?

rattus128 avatar Nov 07 '25 09:11 rattus128

我在今天更新ComfyUI后立即遇到了这个问题。CUDA error: out of memory 搜索cudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with 启用 TORCH_USE_CUDA_DSA 以激活设备端断言功能。

能否粘贴完整的日志内容并描述您的工作流程和硬件配置?

Requested to load WAN21 loaded partially; 5167.08 MB usable, 5167.08 MB loaded, 4633.39 MB offloaded, lowvram patches: 0 Attempting to release mmap (452) !!! Exception during processing !!! CUDA error: out of memory Search for cudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last): File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 510, in execute output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 324, in get_output_data return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 298, in _async_map_node_over_list await process_inputs(input_dict, i) File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 286, in process_inputs result = f(**inputs) File "D:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1559, in sample return common_ksampler(model, noise_seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_at_step, last_step=end_at_step, force_full_denoise=force_full_denoise) File "D:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1492, in common_ksampler samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\sample.py", line 60, in sample samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed) File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_smznodes\smZNodes.py", line 131, in KSampler_sample return orig_fn(*args, **kwargs) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1163, in sample return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_smznodes\smZNodes.py", line 149, in sample return orig_fn(*args, **kwargs) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1053, in sample return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1035, in sample output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 984, in outer_sample self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\sampler_helpers.py", line 130, in prepare_sampling return executor.execute(model, noise_shape, conds, model_options=model_options) ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\sampler_helpers.py", line 138, in _prepare_sampling comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required + inference_memory, minimum_memory_required=minimum_memory_required + inference_memory) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 697, in load_models_gpu loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 506, in model_load self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 535, in model_use_more_vram return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_patcher.py", line 935, in partially_load raise e File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_patcher.py", line 932, in partially_load self.load(device_to, lowvram_model_memory=current_used + extra_memory, force_patch_weights=force_patch_weights, full_load=full_load) ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\gguf\pig.py", line 85, in load m.to(self.load_device).to(self.offload_device) ~~~~^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1371, in to return self._apply(convert) ~~~~~~~~~~~^^^^^^^^^ File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 957, in _apply param_applied = fn(param) File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1357, in convert return t.to( ~~~~^ device, ^^^^^^^ dtype if t.is_floating_point() or t.is_complex() else None, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ non_blocking, ^^^^^^^^^^^^^ ) ^ torch.AcceleratorError: CUDA error: out of memory Search for cudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

Prompt executed in 24.02 seconds 24GB vram 64gb ram, Total VRAM 24455 MB, total RAM 65253 MB pytorch version: 2.9.0+cu128

ucboys-cloud avatar Nov 07 '25 09:11 ucboys-cloud

我在今天更新ComfyUI后立即遇到了这个问题。CUDA error: out of memory 搜索cudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with 启用 TORCH_USE_CUDA_DSA 以激活设备端断言功能。

能否粘贴完整的日志内容并描述您的工作流程和硬件配置?

Requested to load WAN21 loaded partially; 5167.08 MB usable, 5167.08 MB loaded, 4633.39 MB offloaded, lowvram patches: 0 Attempting to release mmap (452) !!! Exception during processing !!! CUDA error: out of memory Search for cudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last): File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 510, in execute output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 324, in get_output_data return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 298, in _async_map_node_over_list await process_inputs(input_dict, i) File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 286, in process_inputs result = f(**inputs) File "D:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1559, in sample return common_ksampler(model, noise_seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_at_step, last_step=end_at_step, force_full_denoise=force_full_denoise) File "D:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1492, in common_ksampler samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\sample.py", line 60, in sample samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed) File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_smznodes\smZNodes.py", line 131, in KSampler_sample return orig_fn(*args, **kwargs) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1163, in sample return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_smznodes\smZNodes.py", line 149, in sample return orig_fn(*args, **kwargs) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1053, in sample return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1035, in sample output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 984, in outer_sample self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\sampler_helpers.py", line 130, in prepare_sampling return executor.execute(model, noise_shape, conds, model_options=model_options) ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\sampler_helpers.py", line 138, in _prepare_sampling comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required + inference_memory, minimum_memory_required=minimum_memory_required + inference_memory) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 697, in load_models_gpu loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 506, in model_load self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 535, in model_use_more_vram return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_patcher.py", line 935, in partially_load raise e File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_patcher.py", line 932, in partially_load self.load(device_to, lowvram_model_memory=current_used + extra_memory, force_patch_weights=force_patch_weights, full_load=full_load) ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\gguf\pig.py", line 85, in load m.to(self.load_device).to(self.offload_device) ~~~~^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1371, in to return self._apply(convert) ~~~~~~~~~~~^^^^^^^^^ File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 957, in _apply param_applied = fn(param) File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1357, in convert return t.to( ~~~~^ device, ^^^^^^^ dtype if t.is_floating_point() or t.is_complex() else None, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ non_blocking, ^^^^^^^^^^^^^ ) ^ torch.AcceleratorError: CUDA error: out of memory Search for cudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

Prompt executed in 24.02 seconds 24GB vram 64gb ram, Total VRAM 24455 MB, total RAM 65253 MB pytorch version: 2.9.0+cu128

Thank you greatly. What are the dimensions and frame count of your generated video?

rattus128 avatar Nov 07 '25 09:11 rattus128

我在今天更新ComfyUI后立即遇到了这个问题。CUDA error: out of memory 搜索cudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with 启用 TORCH_USE_CUDA_DSA 以激活设备端断言功能。

能否粘贴完整的日志内容并描述您的工作流程和硬件配置?

Requested to load WAN21 loaded partially; 5167.08 MB usable, 5167.08 MB loaded, 4633.39 MB offloaded, lowvram patches: 0 Attempting to release mmap (452) !!! Exception during processing !!! CUDA error: out of memory Search for cudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSAto enable device-side assertions. Traceback (most recent call last): File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 510, in execute output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 324, in get_output_data return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 298, in _async_map_node_over_list await process_inputs(input_dict, i) File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 286, in process_inputs result = f(**inputs) File "D:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1559, in sample return common_ksampler(model, noise_seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_at_step, last_step=end_at_step, force_full_denoise=force_full_denoise) File "D:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1492, in common_ksampler samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\sample.py", line 60, in sample samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed) File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_smznodes\smZNodes.py", line 131, in KSampler_sample return orig_fn(*args, **kwargs) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1163, in sample return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_smznodes\smZNodes.py", line 149, in sample return orig_fn(*args, **kwargs) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1053, in sample return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1035, in sample output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 984, in outer_sample self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\sampler_helpers.py", line 130, in prepare_sampling return executor.execute(model, noise_shape, conds, model_options=model_options) ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\sampler_helpers.py", line 138, in _prepare_sampling comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required + inference_memory, minimum_memory_required=minimum_memory_required + inference_memory) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 697, in load_models_gpu loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 506, in model_load self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 535, in model_use_more_vram return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_patcher.py", line 935, in partially_load raise e File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_patcher.py", line 932, in partially_load self.load(device_to, lowvram_model_memory=current_used + extra_memory, force_patch_weights=force_patch_weights, full_load=full_load) ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\gguf\pig.py", line 85, in load m.to(self.load_device).to(self.offload_device) ~~~~^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1371, in to return self._apply(convert) ~~~~~~~~~~~^^^^^^^^^ File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 957, in _apply param_applied = fn(param) File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1357, in convert return t.to( ~~~~^ device, ^^^^^^^ dtype if t.is_floating_point() or t.is_complex() else None, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ non_blocking, ^^^^^^^^^^^^^ ) ^ torch.AcceleratorError: CUDA error: out of memory Search forcudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. Prompt executed in 24.02 seconds 24GB vram 64gb ram, Total VRAM 24455 MB, total RAM 65253 MB pytorch version: 2.9.0+cu128

Thank you greatly. What are the dimensions and frame count of your generated video?

I have been stably using this workflow—Wan2.2 q5 gguf—for over a month. It runs at 720×1280 resolution with 81 frames and has never run out of VRAM. I think this must be due to the recently updated Pinned memory.

ucboys-cloud avatar Nov 07 '25 09:11 ucboys-cloud

我在今天更新ComfyUI后立即遇到了这个问题。CUDA error: out of memory 搜索cudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with 启用 TORCH_USE_CUDA_DSA 以激活设备端断言功能。

能否粘贴完整的日志内容并描述您的工作流程和硬件配置?

Requested to load WAN21 loaded partially; 5167.08 MB usable, 5167.08 MB loaded, 4633.39 MB offloaded, lowvram patches: 0 Attempting to release mmap (452) !!! Exception during processing !!! CUDA error: out of memory Search for cudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSAto enable device-side assertions. Traceback (most recent call last): File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 510, in execute output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 324, in get_output_data return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 298, in _async_map_node_over_list await process_inputs(input_dict, i) File "D:\ComfyUI_windows_portable\ComfyUI\execution.py", line 286, in process_inputs result = f(**inputs) File "D:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1559, in sample return common_ksampler(model, noise_seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_at_step, last_step=end_at_step, force_full_denoise=force_full_denoise) File "D:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1492, in common_ksampler samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\sample.py", line 60, in sample samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed) File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_smznodes\smZNodes.py", line 131, in KSampler_sample return orig_fn(*args, **kwargs) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1163, in sample return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\comfyui_smznodes\smZNodes.py", line 149, in sample return orig_fn(*args, **kwargs) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1053, in sample return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1035, in sample output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes) File "D:\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 984, in outer_sample self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\sampler_helpers.py", line 130, in prepare_sampling return executor.execute(model, noise_shape, conds, model_options=model_options) ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\sampler_helpers.py", line 138, in _prepare_sampling comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required + inference_memory, minimum_memory_required=minimum_memory_required + inference_memory) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 697, in load_models_gpu loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 506, in model_load self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 535, in model_use_more_vram return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_patcher.py", line 935, in partially_load raise e File "D:\ComfyUI_windows_portable\ComfyUI\comfy\model_patcher.py", line 932, in partially_load self.load(device_to, lowvram_model_memory=current_used + extra_memory, force_patch_weights=force_patch_weights, full_load=full_load) ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\gguf\pig.py", line 85, in load m.to(self.load_device).to(self.offload_device) ~~~~^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1371, in to return self._apply(convert) ~~~~~~~~~~~^^^^^^^^^ File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 957, in _apply param_applied = fn(param) File "D:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1357, in convert return t.to( ~~~~^ device, ^^^^^^^ dtype if t.is_floating_point() or t.is_complex() else None, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ non_blocking, ^^^^^^^^^^^^^ ) ^ torch.AcceleratorError: CUDA error: out of memory Search forcudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. Prompt executed in 24.02 seconds 24GB vram 64gb ram, Total VRAM 24455 MB, total RAM 65253 MB pytorch version: 2.9.0+cu128

Thank you greatly. What are the dimensions and frame count of your generated video?

I have been stably using this workflow—Wan2.2 q5 gguf—for over a month. It runs at 720×1280 resolution with 81 frames and has never run out of VRAM. I think this must be due to the recently updated Pinned memory.

Hey, this is using a different GGUF custom node pack. Currently I am actively debugging the city96 GGUF and have already found and fixed issues WRT to pinned memory. The calcius GGUF may need to do some catchup. The code in your custom node pack I assume is from here:

https://github.com/calcuis/gguf/blob/main/pig.py

Which is already one fix behind.

Can you please try the city96 GGUF loader to see if it resolves?

rattus128 avatar Nov 07 '25 10:11 rattus128

I also have 16 GB ( rtx 5060 Ti, 64 GB RAM, Windows) and I run without problems unquantized fp16 Wan 2.2 I2V + 4 steps Loras and fp32 VAE (because I am an obsessive compulsive quality freak). One difference might be that I generally install almost every day nightly torch 2.10.0+cu130 (today is no exception) and the corresponding torchvision and torchaudio. I also run my own builds of xformers, flash attention 2.8.3 and sageattention using latest available git versions and CUDA 13.0.2 (I make new builds when there are updates to those projects maybe within a week time). And of course I always use latest Nvidia driver. For what is worth for the subject of this thread.

I would like to add about my positive experiences that, for some time, I always generate Wan 2.2 I2V videos with 768x768 resolution. 1:1 aspect ratio fits my goals and 768 is a good tradeoff between quality and speed (also that fits my goals). I came to this 768x768 by trial and error and what I noticed is that the total rendering time divided by total pixel count (768x768=589824) is much better than when I tried with other resolutions and aspect ratios (like 832X480=399360). I'm not sure it is false impression or it is real, I didn't made in depth benchmarks, just I was surprised, expecting to get a much bigger performance hit transitioning from 832x480 to 768x768. I have the feeling that somehow 768x768 is much more efficiently processed (being multiple of 256 or something), I don't understand the mechanisms behind the inference at a level that would allow me to make better hypothesis. Anyway, for me 768x768 is kinda magic resolution. I use upscaling if I need more.

jovan2009 avatar Nov 07 '25 10:11 jovan2009

This has a chance of fixing.

https://github.com/comfyanonymous/ComfyUI/pull/10672/files

Anyone still tracking this please feel free to give it a go. It has a very good chance of fixing for @PopHorn and a decent chance of helping @nestflow (based on differences in your messages).

If is still errors, paste everything even if it looks the same. This does have a chance of changing error messages and would give a lot of information.

I am a bit dummy and nub. How to apply that fix?

PopHorn avatar Nov 07 '25 10:11 PopHorn

This has a chance of fixing. https://github.com/comfyanonymous/ComfyUI/pull/10672/files Anyone still tracking this please feel free to give it a go. It has a very good chance of fixing for @PopHorn and a decent chance of helping @nestflow (based on differences in your messages). If is still errors, paste everything even if it looks the same. This does have a chance of changing error messages and would give a lot of information.

I am a bit dummy and nub. How to apply that fix?

Comfy merged that this morning, so if you update to the latest git master version of comfy you will get this fix. If you are just working with stable releases the fix will come in the next stable.

rattus128 avatar Nov 07 '25 10:11 rattus128

This has a chance of fixing. https://github.com/comfyanonymous/ComfyUI/pull/10672/files Anyone still tracking this please feel free to give it a go. It has a very good chance of fixing for @PopHorn and a decent chance of helping @nestflow (based on differences in your messages). If is still errors, paste everything even if it looks the same. This does have a chance of changing error messages and would give a lot of information.

I am a bit dummy and nub. How to apply that fix?

Comfy merged that this morning, so if you update to the latest git master version of comfy you will get this fix. If you are just working with stable releases the fix will come in the next stable.

So far so good. Native WFs are working to me now. TY

PopHorn avatar Nov 07 '25 13:11 PopHorn

anyone tracking this with a reproducer please try:

https://github.com/city96/ComfyUI-GGUF/pull/357

for multiple fixes.

rattus128 avatar Nov 07 '25 14:11 rattus128

I can also confirm that pinned memory definitely causes issues. When trying to run qwen image model q8 gguf after the update, I kept getting a cuda oom (unlike ever before), but after starting comfyui with "--disable-pinned-memory", the problem was fixed. Also others on reddit seem to have similar issues: https://www.reddit.com/r/comfyui/comments/1opqlxv/new_update/

Here's the error I got: loaded partially; 13202.71 MB usable, 13202.64 MB loaded, 7658.84 MB offloaded, lowvram patches: 0 Attempting to release mmap (562) !!! Exception during processing !!! CUDA error: out of memory Search for cudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last): File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 510, in execute output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 324, in get_output_data return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 298, in _async_map_node_over_list await process_inputs(input_dict, i) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 286, in process_inputs result = f(**inputs) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\comfy_extras\nodes_custom_sampler.py", line 835, in sample samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\comfy\samplers.py", line 1035, in sample output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\comfy\patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\comfy\samplers.py", line 984, in outer_sample self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\comfy\sampler_helpers.py", line 130, in prepare_sampling return executor.execute(model, noise_shape, conds, model_options=model_options) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\comfy\patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\comfy\sampler_helpers.py", line 138, in _prepare_sampling comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required + inference_memory, minimum_memory_required=minimum_memory_required + inference_memory) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_management.py", line 697, in load_models_gpu loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_management.py", line 506, in model_load self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_management.py", line 535, in model_use_more_vram return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_patcher.py", line 935, in partially_load raise e File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_patcher.py", line 932, in partially_load self.load(device_to, lowvram_model_memory=current_used + extra_memory, force_patch_weights=force_patch_weights, full_load=full_load) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\ComfyUI-GGUF\nodes.py", line 103, in load m.to(self.load_device).to(self.offload_device) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1371, in to return self._apply(convert) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 957, in _apply param_applied = fn(param) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1357, in convert return t.to( File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\ComfyUI-GGUF\ops.py", line 58, in to new = super().to(*args, **kwargs) File "C:\Users\user\Documents\programs\AI\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch_tensor.py", line 1654, in torch_function ret = func(*args, **kwargs) torch.AcceleratorError: CUDA error: out of memory Search for cudaErrorMemoryAllocation' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

Prompt executed in 49.62 seconds

EkstraTuta avatar Nov 07 '25 19:11 EkstraTuta