ComfyUI-WanVideoWrapper icon indicating copy to clipboard operation
ComfyUI-WanVideoWrapper copied to clipboard

OOM error 16gb vRAM with Mocha Workflow

Open TomiTom1234 opened this issue 1 month ago • 4 comments

I tried the Mocha workflow, downloaded all the models necessary, but I get OOM when I increase the frames more than 81 (which equals 3 seconds). Any idea if there is a step to stop this OOM error? (Tried increasing the Blocks from 40, but it allows me to max 48).

Error during sampling: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception in thread Thread-17 (prompt_worker):
Traceback (most recent call last):
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 510, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 324, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 298, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 286, in process_inputs
    result = f(**inputs)
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper_\nodes_sampler.py", line 3058, in process
    raise e
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper_\nodes_sampler.py", line 2940, in process
    noise_pred, noise_pred_ovi, self.cache_state = predict_with_cfg(
                                                   ~~~~~~~~~~~~~~~~^
        latent_model_input,
        ^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
        wananim_face_pixels=wananim_face_pixels, wananim_pose_latents=wananim_pose_latents, uni3c_data = uni3c_data, latent_model_input_ovi=latent_model_input_ovi, flashvsr_LQ_latent=flashvsr_LQ_latent,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper_\nodes_sampler.py", line 1550, in predict_with_cfg
    raise e
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper_\nodes_sampler.py", line 1421, in predict_with_cfg
    noise_pred_cond, noise_pred_ovi, cache_state_cond = transformer(
                                                        ~~~~~~~~~~~^
        context=positive_embeds,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
        **base_params
        ^^^^^^^^^^^^^
    )
    ^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper_\wanvideo\modules\model.py", line 2944, in forward
    block.to(self.offload_device, non_blocking=self.use_non_blocking)
    ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1369, in to
    return self._apply(convert)
           ~~~~~~~~~~~^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 928, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 928, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 955, in _apply
    param_applied = fn(param)
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1355, in convert
    return t.to(
           ~~~~^
        device,
        ^^^^^^^
        dtype if t.is_floating_point() or t.is_complex() else None,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        non_blocking,
        ^^^^^^^^^^^^^
    )
    ^
torch.AcceleratorError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "threading.py", line 1043, in _bootstrap_inner
  File "threading.py", line 994, in run
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\main.py", line 202, in prompt_worker
    e.execute(item[2], prompt_id, extra_data, item[4])
    ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 664, in execute
    asyncio.run(self.execute_async(prompt, prompt_id, extra_data, execute_outputs))
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "asyncio\runners.py", line 195, in run
  File "asyncio\runners.py", line 118, in run
  File "asyncio\base_events.py", line 725, in run_until_complete
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 711, in execute_async
    result, error, ex = await execute(self.server, dynamic_prompt, self.caches, node_id, extra_data, executed, prompt_id, execution_list, pending_subgraph_results, pending_async_nodes, ui_node_outputs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 588, in execute
    input_data_formatted[name] = [format_value(x) for x in inputs]
                                  ~~~~~~~~~~~~^^^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 402, in format_value
    return str(x)
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\_tensor.py", line 590, in __repr__
    return torch._tensor_str._str(self, tensor_contents=tensor_contents)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\_tensor_str.py", line 726, in _str
    return _str_intern(self, tensor_contents=tensor_contents)
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\_tensor_str.py", line 647, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\_tensor_str.py", line 379, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
                           ~~~~~~~~~~~~~~~~~~~^^^^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\_tensor_str.py", line 415, in get_summarized_data
    return torch.stack([get_summarized_data(x) for x in (start + end)])
                        ~~~~~~~~~~~~~~~~~~~^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\_tensor_str.py", line 415, in get_summarized_data
    return torch.stack([get_summarized_data(x) for x in (start + end)])
                        ~~~~~~~~~~~~~~~~~~~^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\_tensor_str.py", line 415, in get_summarized_data
    return torch.stack([get_summarized_data(x) for x in (start + end)])
                        ~~~~~~~~~~~~~~~~~~~^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\_tensor_str.py", line 405, in get_summarized_data
    return torch.cat(
           ~~~~~~~~~^
        (self[: PRINT_OPTS.edgeitems], self[-PRINT_OPTS.edgeitems :])
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
torch.AcceleratorError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

TomiTom1234 avatar Nov 07 '25 11:11 TomiTom1234

CUDA error: out of memory

This is actually RAM error, not VRAM, so less block swap and disable the non-blocking on the block swap node if it's enabled may help.

kijai avatar Nov 07 '25 13:11 kijai

Thank you for replying. I tried your advice, but still get OOM.. This time this error:

Initializing block swap: 100%|█████████████████████████████████████████████████████████| 40/40 [00:01<00:00, 30.03it/s]
----------------------
Block swap memory summary:
Transformer blocks on cpu: 6369.40MB
Transformer blocks on cuda:0: 7039.86MB
Total memory used by transformer blocks: 13409.26MB
Non-blocking memory transfer: False
----------------------
Using Mocha RoPE
Input sequence length: 349440
Sampling 441 frames at 480x832 with 6 steps
  0%|                                                                                            | 0/6 [00:00<?, ?it/s]Error during model prediction: Allocation on device
  0%|                                                                                            | 0/6 [00:04<?, ?it/s]
Error during sampling: Allocation on device
!!! Exception during processing !!! Allocation on device
Traceback (most recent call last):
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 510, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 324, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 298, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 286, in process_inputs
    result = f(**inputs)
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\nodes_sampler.py", line 3058, in process
    raise e
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\nodes_sampler.py", line 2940, in process
    noise_pred, noise_pred_ovi, self.cache_state = predict_with_cfg(
                                                   ~~~~~~~~~~~~~~~~^
        latent_model_input,
        ^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
        wananim_face_pixels=wananim_face_pixels, wananim_pose_latents=wananim_pose_latents, uni3c_data = uni3c_data, latent_model_input_ovi=latent_model_input_ovi, flashvsr_LQ_latent=flashvsr_LQ_latent,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\nodes_sampler.py", line 1550, in predict_with_cfg
    raise e
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\nodes_sampler.py", line 1421, in predict_with_cfg
    noise_pred_cond, noise_pred_ovi, cache_state_cond = transformer(
                                                        ~~~~~~~~~~~^
        context=positive_embeds,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
        **base_params
        ^^^^^^^^^^^^^
    )
    ^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\model.py", line 2338, in forward
    x = [self.original_patch_embedding(u.unsqueeze(0).to(torch.float32)).to(x[0].dtype) for u in x]
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
torch.OutOfMemoryError: Allocation on device

Got an OOM, unloading all loaded models.
Prompt executed in 7.63 seconds

TomiTom1234 avatar Nov 08 '25 07:11 TomiTom1234

Thank you for replying. I tried your advice, but still get OOM.. This time this error:

Initializing block swap: 100%|█████████████████████████████████████████████████████████| 40/40 [00:01<00:00, 30.03it/s]
----------------------
Block swap memory summary:
Transformer blocks on cpu: 6369.40MB
Transformer blocks on cuda:0: 7039.86MB
Total memory used by transformer blocks: 13409.26MB
Non-blocking memory transfer: False
----------------------
Using Mocha RoPE
Input sequence length: 349440
Sampling 441 frames at 480x832 with 6 steps
  0%|                                                                                            | 0/6 [00:00<?, ?it/s]Error during model prediction: Allocation on device
  0%|                                                                                            | 0/6 [00:04<?, ?it/s]
Error during sampling: Allocation on device
!!! Exception during processing !!! Allocation on device
Traceback (most recent call last):
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 510, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 324, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 298, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 286, in process_inputs
    result = f(**inputs)
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\nodes_sampler.py", line 3058, in process
    raise e
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\nodes_sampler.py", line 2940, in process
    noise_pred, noise_pred_ovi, self.cache_state = predict_with_cfg(
                                                   ~~~~~~~~~~~~~~~~^
        latent_model_input,
        ^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
        wananim_face_pixels=wananim_face_pixels, wananim_pose_latents=wananim_pose_latents, uni3c_data = uni3c_data, latent_model_input_ovi=latent_model_input_ovi, flashvsr_LQ_latent=flashvsr_LQ_latent,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\nodes_sampler.py", line 1550, in predict_with_cfg
    raise e
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\nodes_sampler.py", line 1421, in predict_with_cfg
    noise_pred_cond, noise_pred_ovi, cache_state_cond = transformer(
                                                        ~~~~~~~~~~~^
        context=positive_embeds,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
        **base_params
        ^^^^^^^^^^^^^
    )
    ^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper\wanvideo\modules\model.py", line 2338, in forward
    x = [self.original_patch_embedding(u.unsqueeze(0).to(torch.float32)).to(x[0].dtype) for u in x]
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
torch.OutOfMemoryError: Allocation on device

Got an OOM, unloading all loaded models.
Prompt executed in 7.63 seconds

Looks like you are trying to do 441 frames directly without frame windowing or context windowing? That's just too much, issue is how the workflow is setup.

kijai avatar Nov 08 '25 08:11 kijai

I tried the Mocha workflow, downloaded all the models necessary, but I get OOM when I increase the frames more than 81 (which equals 3 seconds). Any idea if there is a step to stop this OOM error? (Tried increasing the Blocks from 40, but it allows me to max 48).

Error during sampling: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Exception in thread Thread-17 (prompt_worker):
Traceback (most recent call last):
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 510, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 324, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 298, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 286, in process_inputs
    result = f(**inputs)
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper_\nodes_sampler.py", line 3058, in process
    raise e
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper_\nodes_sampler.py", line 2940, in process
    noise_pred, noise_pred_ovi, self.cache_state = predict_with_cfg(
                                                   ~~~~~~~~~~~~~~~~^
        latent_model_input,
        ^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
        wananim_face_pixels=wananim_face_pixels, wananim_pose_latents=wananim_pose_latents, uni3c_data = uni3c_data, latent_model_input_ovi=latent_model_input_ovi, flashvsr_LQ_latent=flashvsr_LQ_latent,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper_\nodes_sampler.py", line 1550, in predict_with_cfg
    raise e
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper_\nodes_sampler.py", line 1421, in predict_with_cfg
    noise_pred_cond, noise_pred_ovi, cache_state_cond = transformer(
                                                        ~~~~~~~~~~~^
        context=positive_embeds,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
        **base_params
        ^^^^^^^^^^^^^
    )
    ^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WanVideoWrapper_\wanvideo\modules\model.py", line 2944, in forward
    block.to(self.offload_device, non_blocking=self.use_non_blocking)
    ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1369, in to
    return self._apply(convert)
           ~~~~~~~~~~~^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 928, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 928, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 955, in _apply
    param_applied = fn(param)
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1355, in convert
    return t.to(
           ~~~~^
        device,
        ^^^^^^^
        dtype if t.is_floating_point() or t.is_complex() else None,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        non_blocking,
        ^^^^^^^^^^^^^
    )
    ^
torch.AcceleratorError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "threading.py", line 1043, in _bootstrap_inner
  File "threading.py", line 994, in run
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\main.py", line 202, in prompt_worker
    e.execute(item[2], prompt_id, extra_data, item[4])
    ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 664, in execute
    asyncio.run(self.execute_async(prompt, prompt_id, extra_data, execute_outputs))
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "asyncio\runners.py", line 195, in run
  File "asyncio\runners.py", line 118, in run
  File "asyncio\base_events.py", line 725, in run_until_complete
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 711, in execute_async
    result, error, ex = await execute(self.server, dynamic_prompt, self.caches, node_id, extra_data, executed, prompt_id, execution_list, pending_subgraph_results, pending_async_nodes, ui_node_outputs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 588, in execute
    input_data_formatted[name] = [format_value(x) for x in inputs]
                                  ~~~~~~~~~~~~^^^
  File "F:\AI\ComfyUI_windows_portable\ComfyUI\execution.py", line 402, in format_value
    return str(x)
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\_tensor.py", line 590, in __repr__
    return torch._tensor_str._str(self, tensor_contents=tensor_contents)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\_tensor_str.py", line 726, in _str
    return _str_intern(self, tensor_contents=tensor_contents)
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\_tensor_str.py", line 647, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\_tensor_str.py", line 379, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
                           ~~~~~~~~~~~~~~~~~~~^^^^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\_tensor_str.py", line 415, in get_summarized_data
    return torch.stack([get_summarized_data(x) for x in (start + end)])
                        ~~~~~~~~~~~~~~~~~~~^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\_tensor_str.py", line 415, in get_summarized_data
    return torch.stack([get_summarized_data(x) for x in (start + end)])
                        ~~~~~~~~~~~~~~~~~~~^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\_tensor_str.py", line 415, in get_summarized_data
    return torch.stack([get_summarized_data(x) for x in (start + end)])
                        ~~~~~~~~~~~~~~~~~~~^^^
  File "F:\AI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\_tensor_str.py", line 405, in get_summarized_data
    return torch.cat(
           ~~~~~~~~~^
        (self[: PRINT_OPTS.edgeitems], self[-PRINT_OPTS.edgeitems :])
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
torch.AcceleratorError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I had the issue, this worked: https://www.youtube.com/watch?v=Um23MYVhoRY

Yogesh-DevHub avatar Nov 08 '25 09:11 Yogesh-DevHub