WanImageToVideo OOM after updating
480p 81fps oom, not oom before ,it happens after updating to the latest version.
CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
torch2.6, python3.12 .
480p 81fps oom, not oom before ,it happens after updating to the latest version.
CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.torch2.6, python3.12 .
Hi, what is your hardware, and can you "show report" and paste everything in the backtrace.
Thanks.
Same here with txt2img worklflow and wan2.1_t2v_14B_fp8_scaled model when using TorchCompileModel node, which was working fine before (yesterday). Disabling/bypassing TorchCompileModel works without oom (much slower of course) despite using much more VRAM. This is with a RTX 3060 12GB.
Post the full log, if you don't your report is completely useless.
reverted and use a new latest package ,not oom on WanImageToVideo ,but oom on vae decode. even 480p oom, i have to use vae tilled decode,much slower i think
reverted and use a new latest package ,not oom on WanImageToVideo ,but oom on vae decode. even 480p oom, i have to use vae tilled decode,much slower i think
I just tested the WAN VAE in isolation at 480P with no variance in OOM behaviour. It is broken on v0.3.65 however but that should be fixed if you up to date.
I just did I2V 480P full flow on my 3060 and the VRAM is very close to the ceiling so even a small variation in VRAM would push me to the tiler.
We need to know your hardware and see your full log to know more.
Same here with txt2img worklflow and wan2.1_t2v_14B_fp8_scaled model when using TorchCompileModel node, which was working fine before (yesterday). Disabling/bypassing TorchCompileModel works without oom (much slower of course) despite using much more VRAM. This is with a RTX 3060 12GB.
Theres a very fresh merge from Comfy for fixing fp8+torch compiler (6hrs old as of this writing). Please pull the latest git and give it a try. Thanks.
I tried wan2.2 ,sampler oom too. latest comfyui package. 4070 12G ,params only add --fast @rattus128
Traceback (most recent call last):
File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 510, in execute
output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 324, in get_output_data
return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 298, in _async_map_node_over_list
await process_inputs(input_dict, i)
File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 286, in process_inputs
result = f(**inputs)
File "M:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1559, in sample
return common_ksampler(model, noise_seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_at_step, last_step=end_at_step, force_full_denoise=force_full_denoise)
File "M:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1492, in common_ksampler
samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
denoise=denoise, disable_noise=disable_noise, start_step=start_step, last_step=last_step,
force_full_denoise=force_full_denoise, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "M:\ComfyUI_windows_portable\ComfyUI\comfy\sample.py", line 60, in sample
samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1163, in sample
return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1053, in sample
return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1035, in sample
output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
File "M:\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 112, in execute
return self.original(*args, **kwargs)
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 984, in outer_sample
self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "M:\ComfyUI_windows_portable\ComfyUI\comfy\sampler_helpers.py", line 130, in prepare_sampling
return executor.execute(model, noise_shape, conds, model_options=model_options)
~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "M:\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 112, in execute
return self.original(*args, **kwargs)
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
File "M:\ComfyUI_windows_portable\ComfyUI\comfy\sampler_helpers.py", line 138, in _prepare_sampling
comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required + inference_memory, minimum_memory_required=minimum_memory_required + inference_memory)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 697, in load_models_gpu
loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights)
~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 506, in model_load
self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights)
~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 535, in model_use_more_vram
return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights)
~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_patcher.py", line 934, in partially_load
raise e
File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_patcher.py", line 931, in partially_load
self.load(device_to, lowvram_model_memory=current_used + extra_memory, force_patch_weights=force_patch_weights, full_load=full_load)
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "M:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF\nodes.py", line 103, in load
m.to(self.load_device).to(self.offload_device)
~~~~^^^^^^^^^^^^^^^^^^
File "M:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1369, in to
return self._apply(convert)
~~~~~~~~~~~^^^^^^^^^
File "M:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 955, in _apply
param_applied = fn(param)
File "M:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1355, in convert
return t.to(
~~~~^
device,
^^^^^^^
dtype if t.is_floating_point() or t.is_complex() else None,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
non_blocking,
^^^^^^^^^^^^^
)
^
File "M:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF\ops.py", line 58, in to
new = super().to(*args, **kwargs)
File "M:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch_tensor.py", line 1669, in torch_function ret = func(*args, **kwargs)
torch.AcceleratorError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
After run again , I got .
Traceback (most recent call last):
File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 510, in execute
output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 324, in get_output_data
return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 298, in _async_map_node_over_list
await process_inputs(input_dict, i)
File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 286, in process_inputs
result = f(**inputs)
File "M:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1559, in sample
return common_ksampler(model, noise_seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_at_step, last_step=end_at_step, force_full_denoise=force_full_denoise)
File "M:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1492, in common_ksampler
samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
denoise=denoise, disable_noise=disable_noise, start_step=start_step, last_step=last_step,
force_full_denoise=force_full_denoise, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "M:\ComfyUI_windows_portable\ComfyUI\comfy\sample.py", line 58, in sample
sampler = comfy.samplers.KSampler(model, steps=steps, device=model.load_device, sampler=sampler_name, scheduler=scheduler, denoise=denoise, model_options=model.model_options)
File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1113, in init
self.set_steps(steps, denoise)
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^
File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1134, in set_steps
self.sigmas = self.calculate_sigmas(steps).to(self.device)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: pointer does not correspond to a registered memory region
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
I tried wan2.2 ,sampler oom too. latest comfyui package. 4070 12G ,params only add --fast @rattus128
Traceback (most recent call last): File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 510, in execute output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 324, in get_output_data return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 298, in _async_map_node_over_list await process_inputs(input_dict, i) File "M:\ComfyUI_windows_portable\ComfyUI\execution.py", line 286, in process_inputs result = f(**inputs) File "M:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1559, in sample return common_ksampler(model, noise_seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_at_step, last_step=end_at_step, force_full_denoise=force_full_denoise) File "M:\ComfyUI_windows_portable\ComfyUI\nodes.py", line 1492, in common_ksampler samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "M:\ComfyUI_windows_portable\ComfyUI\comfy\sample.py", line 60, in sample samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed) File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1163, in sample return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed) File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1053, in sample return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed) ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 1035, in sample output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes) File "M:\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\samplers.py", line 984, in outer_sample self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\sampler_helpers.py", line 130, in prepare_sampling return executor.execute(model, noise_shape, conds, model_options=model_options) ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\patcher_extension.py", line 112, in execute return self.original(*args, **kwargs) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\sampler_helpers.py", line 138, in _prepare_sampling comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required + inference_memory, minimum_memory_required=minimum_memory_required + inference_memory) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 697, in load_models_gpu loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 506, in model_load self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 535, in model_use_more_vram return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights) ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_patcher.py", line 934, in partially_load raise e File "M:\ComfyUI_windows_portable\ComfyUI\comfy\model_patcher.py", line 931, in partially_load self.load(device_to, lowvram_model_memory=current_used + extra_memory, force_patch_weights=force_patch_weights, full_load=full_load) ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF\nodes.py", line 103, in load m.to(self.load_device).to(self.offload_device) ~~~~^^^^^^^^^^^^^^^^^^ File "M:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1369, in to return self._apply(convert) ~~~~~~~~~~~^^^^^^^^^ File "M:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 955, in _apply param_applied = fn(param) File "M:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1355, in convert return t.to( ~~~~^ device, ^^^^^^^ dtype if t.is_floating_point() or t.is_complex() else None, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ non_blocking, ^^^^^^^^^^^^^ ) ^ File "M:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-GGUF\ops.py", line 58, in to new = super().to(*args, **kwargs) File "M:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch_tensor.py", line 1669, in torch_function ret = func(*args, **kwargs) torch.AcceleratorError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with
TORCH_USE_CUDA_DSAto enable device-side assertions.
This isnt an OOM. Its a different kind of crash. There are 2 other issues tracking this, but please try:
https://github.com/city96/ComfyUI-GGUF/pull/355
if you would like to help out with fix testing pre merge.
No oom, but vae decode node crash ,no logs ,just crash .--use-sage-attention --fast pinned_memory
same with qwen , I think the way loras are applied is broken , no OOM without applying loras
- no loras = 21gb vram
- any node that applies lora = 31+ vram oom lag
I'm getting this exact error after updating.
CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
My hardware is a 4070ti, and I am crashing on literally any workflow that includes VAE Decode.
I cannot remember the last time I've seen OOM error in Comfy, it's been AGES - this is occuring on even the most simple SDXL workflows that are supposed to execute in 5 seconds flat.
Note that I am NOT using any LoRA loader.
I've rolled back my windows portable install by checking out the backup branch - problem solved instantly.
I'm getting this exact error after updating.
CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.My hardware is a 4070ti, and I am crashing on literally any workflow that includes VAE Decode.
I cannot remember the last time I've seen OOM error in Comfy, it's been AGES - this is occuring on even the most simple SDXL workflows that are supposed to execute in 5 seconds flat.
Note that I am NOT using any LoRA loader.
I've rolled back my windows portable install by checking out the backup branch - problem solved instantly.
The main error message is a few lines before where your paste starts. Just paste it all.
registered
I just produced the "CUDA error: pointer does not correspond to a registered memory region" error with a minimal test program. Thanks!
我也遇到了相同的问题,原comfyui能顺利跑qwen-image,更新后不能跑了卡在ksampler,配了一天半时间的环境。现在将comfyui的版本退到0.3.50就可以了
建议退到0.3.59。0.3.50的controllnet加载节点有问题
我也遇到了相同的问题,原comfyui能顺利跑qwen-image,更新后不能跑了卡在ksampler,配了一天半时间的环境。现在将comfyui的版本退到0.3.50就可以了
建议退到0.3.59。0.3.50的controllnet加载节点有问题
The rollback to v 3.59 works! 谢谢!