ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

WanImageToVideo OOM after updating

Open zwukong opened this issue 1 month ago • 16 comments

480p 81fps oom, not oom before ,it happens after updating to the latest version.

CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

torch2.6, python3.12 .

zwukong avatar Oct 31 '25 06:10 zwukong

480p 81fps oom, not oom before ,it happens after updating to the latest version.

CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

torch2.6, python3.12 .

Hi, what is your hardware, and can you "show report" and paste everything in the backtrace.

Thanks.

rattus128 avatar Oct 31 '25 14:10 rattus128

trace.txt

Same here with txt2img worklflow and wan2.1_t2v_14B_fp8_scaled model when using TorchCompileModel node, which was working fine before (yesterday). Disabling/bypassing TorchCompileModel works without oom (much slower of course) despite using much more VRAM. This is with a RTX 3060 12GB.

httkl avatar Oct 31 '25 22:10 httkl

Post the full log, if you don't your report is completely useless.

comfyanonymous avatar Nov 01 '25 00:11 comfyanonymous

reverted and use a new latest package ,not oom on WanImageToVideo ,but oom on vae decode. even 480p oom, i have to use vae tilled decode,much slower i think

zwukong avatar Nov 01 '25 03:11 zwukong

reverted and use a new latest package ,not oom on WanImageToVideo ,but oom on vae decode. even 480p oom, i have to use vae tilled decode,much slower i think

I just tested the WAN VAE in isolation at 480P with no variance in OOM behaviour. It is broken on v0.3.65 however but that should be fixed if you up to date.

I just did I2V 480P full flow on my 3060 and the VRAM is very close to the ceiling so even a small variation in VRAM would push me to the tiler.

We need to know your hardware and see your full log to know more.

rattus128 avatar Nov 01 '25 10:11 rattus128

trace.txt

Same here with txt2img worklflow and wan2.1_t2v_14B_fp8_scaled model when using TorchCompileModel node, which was working fine before (yesterday). Disabling/bypassing TorchCompileModel works without oom (much slower of course) despite using much more VRAM. This is with a RTX 3060 12GB.

Theres a very fresh merge from Comfy for fixing fp8+torch compiler (6hrs old as of this writing). Please pull the latest git and give it a try. Thanks.

rattus128 avatar Nov 01 '25 10:11 rattus128

I tried wan2.2 ,sampler oom too. latest comfyui package. 4070 12G ,params only add --fast @rattus128

Traceback (most recent call last): File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 510, in execute output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 324, in get_output_data return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 298, in _async_map_node_over_list await process_inputs(input_dict, i) File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 286, in process_inputs result = f(**inputs) File "M:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1559, in sample return common_ksampler(model, noise_seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_at_step, last_step=end_at_step, force_full_denoise=force_full_denoise) File "M:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1492, in common_ksampler samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "M:\ComfyUI_windows_portable\ComfyUI\comfy\sample.py", line 60, in sample samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed) File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1163, in sample return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1053, in sample return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1035, in sample output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes) File "M:\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 984, in outer_sample self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\sampler_helpers.py", line 130, in prepare_sampling return executor.execute(model, noise_shape, conds, model_options=model_options) ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\sampler_helpers.py", line 138, in _prepare_sampling comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required + inference_memory, minimum_memory_required=minimum_memory_required + inference_memory) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 697, in load_models_gpu loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 506, in model_load self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 535, in model_use_more_vram return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_patcher.py", line 934, in partially_load raise e File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_patcher.py", line 931, in partially_load self.load(device_to, lowvram_model_memory=current_used + extra_memory, force_patch_weights=force_patch_weights, full_load=full_load) ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF\nodes.py", line 103, in load m.to(self.load_device).to(self.offload_device) ~~~~^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1369, in to return self._apply(convert) ~~~~~~~~~~~^^^^^^^^^ File "M:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 955, in _apply param_applied = fn(param) File "M:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1355, in convert return t.to( ~~~~^ device, ^^^^^^^ dtype if t.is_floating_point() or t.is_complex() else None, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ non_blocking, ^^^^^^^^^^^^^ ) ^ File "M:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF\ops.py", line 58, in to new = super().to(*args, **kwargs) File "M:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch_tensor.py", line 1669, in torch_function ret = func(*args, **kwargs) torch.AcceleratorError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

zwukong avatar Nov 01 '25 11:11 zwukong

After run again , I got .

Traceback (most recent call last): File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 510, in execute output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 324, in get_output_data return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 298, in _async_map_node_over_list await process_inputs(input_dict, i) File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 286, in process_inputs result = f(**inputs) File "M:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1559, in sample return common_ksampler(model, noise_seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_at_step, last_step=end_at_step, force_full_denoise=force_full_denoise) File "M:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1492, in common_ksampler samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "M:\ComfyUI_windows_portable\ComfyUI\comfy\sample.py", line 58, in sample sampler = comfy.samplers.KSampler(model, steps=steps, device=model.load_device, sampler=sampler_name, scheduler=scheduler, denoise=denoise, model_options=model.model_options) File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1113, in init self.set_steps(steps, denoise) ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1134, in set_steps self.sigmas = self.calculate_sigmas(steps).to(self.device) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^ torch.AcceleratorError: CUDA error: pointer does not correspond to a registered memory region CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

zwukong avatar Nov 01 '25 12:11 zwukong

I tried wan2.2 ,sampler oom too. latest comfyui package. 4070 12G ,params only add --fast @rattus128

Traceback (most recent call last): File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 510, in execute output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 324, in get_output_data return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 298, in _async_map_node_over_list await process_inputs(input_dict, i) File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 286, in process_inputs result = f(**inputs) File "M:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1559, in sample return common_ksampler(model, noise_seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_at_step, last_step=end_at_step, force_full_denoise=force_full_denoise) File "M:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1492, in common_ksampler samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "M:\ComfyUI_windows_portable\ComfyUI\comfy\sample.py", line 60, in sample samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed) File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1163, in sample return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1053, in sample return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1035, in sample output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes) File "M:\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 984, in outer_sample self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\sampler_helpers.py", line 130, in prepare_sampling return executor.execute(model, noise_shape, conds, model_options=model_options) ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\sampler_helpers.py", line 138, in _prepare_sampling comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required + inference_memory, minimum_memory_required=minimum_memory_required + inference_memory) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 697, in load_models_gpu loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 506, in model_load self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 535, in model_use_more_vram return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_patcher.py", line 934, in partially_load raise e File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_patcher.py", line 931, in partially_load self.load(device_to, lowvram_model_memory=current_used + extra_memory, force_patch_weights=force_patch_weights, full_load=full_load) ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF\nodes.py", line 103, in load m.to(self.load_device).to(self.offload_device) ~~~~^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1369, in to return self._apply(convert) ~~~~~~~~~~~^^^^^^^^^ File "M:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 955, in _apply param_applied = fn(param) File "M:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1355, in convert return t.to( ~~~~^ device, ^^^^^^^ dtype if t.is_floating_point() or t.is_complex() else None, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ non_blocking, ^^^^^^^^^^^^^ ) ^ File "M:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF\ops.py", line 58, in to new = super().to(*args, **kwargs) File "M:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch_tensor.py", line 1669, in torch_function ret = func(*args, **kwargs) torch.AcceleratorError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

This isnt an OOM. Its a different kind of crash. There are 2 other issues tracking this, but please try:

https://github.com/city96/ComfyUI-GGUF/pull/355

if you would like to help out with fix testing pre merge.

rattus128 avatar Nov 02 '25 11:11 rattus128

No oom, but vae decode node crash ,no logs ,just crash .--use-sage-attention --fast pinned_memory

Image

zwukong avatar Nov 02 '25 11:11 zwukong

same with qwen , I think the way loras are applied is broken , no OOM without applying loras

  • no loras = 21gb vram
  • any node that applies lora = 31+ vram oom lag

siraxe avatar Nov 04 '25 06:11 siraxe

I'm getting this exact error after updating.

CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

My hardware is a 4070ti, and I am crashing on literally any workflow that includes VAE Decode.

I cannot remember the last time I've seen OOM error in Comfy, it's been AGES - this is occuring on even the most simple SDXL workflows that are supposed to execute in 5 seconds flat.

Note that I am NOT using any LoRA loader.

I've rolled back my windows portable install by checking out the backup branch - problem solved instantly.

altoiddealer avatar Nov 06 '25 17:11 altoiddealer

I'm getting this exact error after updating.

CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

My hardware is a 4070ti, and I am crashing on literally any workflow that includes VAE Decode.

I cannot remember the last time I've seen OOM error in Comfy, it's been AGES - this is occuring on even the most simple SDXL workflows that are supposed to execute in 5 seconds flat.

Note that I am NOT using any LoRA loader.

I've rolled back my windows portable install by checking out the backup branch - problem solved instantly.

The main error message is a few lines before where your paste starts. Just paste it all.

rattus128 avatar Nov 07 '25 04:11 rattus128

registered

I just produced the "CUDA error: pointer does not correspond to a registered memory region" error with a minimal test program. Thanks!

rattus128 avatar Nov 07 '25 04:11 rattus128

我也遇到了相同的问题,原comfyui能顺利跑qwen-image,更新后不能跑了卡在ksampler,配了一天半时间的环境。现在将comfyui的版本退到0.3.50就可以了

建议退到0.3.59。0.3.50的controllnet加载节点有问题

moming975 avatar Nov 09 '25 09:11 moming975

我也遇到了相同的问题,原comfyui能顺利跑qwen-image,更新后不能跑了卡在ksampler,配了一天半时间的环境。现在将comfyui的版本退到0.3.50就可以了

建议退到0.3.59。0.3.50的controllnet加载节点有问题

The rollback to v 3.59 works! 谢谢!

hoyleontour avatar Nov 10 '25 19:11 hoyleontour