ComfyUI CUDA error: invalid argument

Custom Node Testing

[ ] I have tried disabling custom nodes and the issue persists (see how to disable custom nodes if you need help)

Your question

When I used PuLID (PuLID_Flux_II), this error occurred.

However, this phenomenon isn't reliably reproducible, which is causing me trouble.

Sometimes image generation proceeds normally without the error occurring
Sometimes, despite the error appearing, repeated attempts eventually succeed in generating an image
Sometimes disabling the node allows normal image generation
Sometimes disabling the node does not resolve the issue

If this is a clear bug, I would appreciate it being fixed promptly. However, if there is a way I can resolve this myself without waiting for a fix, please do let me know.

The operating environment is as follows: Python: 3.10.11 (within Stability Marix) Pytorch version: 2.9.1+cu128 (even when I set it to cu130, it would revert to cu128 for some reason upon update) OS: Windows 10 VRAM: 12GB DRAM: 64GB

Logs

!!! Exception during processing !!! CUDA error: invalid argument
Search for `cudaErrorInvalidValue' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 515, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 329, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 303, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 291, in process_inputs
    result = f(**inputs)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\nodes.py", line 1538, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\nodes.py", line 1505, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\sample.py", line 60, in sample
    samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\samplers.py", line 1163, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\samplers.py", line 1053, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\samplers.py", line 1035, in sample
    output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\patcher_extension.py", line 113, in execute
    return self.wrappers[self.idx](self, *args, **kwargs)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\ComfyUI_PuLID_Flux_ll\pulidflux.py", line 625, in pulid_outer_sample_wrappers_with_override
    out = wrapper_executor(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, **kwargs)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\patcher_extension.py", line 105, in __call__
    return new_executor.execute(*args, **kwargs)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\samplers.py", line 984, in outer_sample
    self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\sampler_helpers.py", line 130, in prepare_sampling
    return executor.execute(model, noise_shape, conds, model_options=model_options)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\sampler_helpers.py", line 138, in _prepare_sampling
    comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required + inference_memory, minimum_memory_required=minimum_memory_required + inference_memory)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_management.py", line 701, in load_models_gpu
    loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_management.py", line 506, in model_load
    self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_management.py", line 536, in model_use_more_vram
    return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_patcher.py", line 952, in partially_load
    self.partially_unload(self.offload_device, -extra_memory, force_patch_weights=force_patch_weights)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_patcher.py", line 901, in partially_unload
    m.to(device_to)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1371, in to
    return self._apply(convert)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\ops.py", line 639, in _apply
    self.register_parameter(key, torch.nn.Parameter(fn(param), requires_grad=False))
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1357, in convert
    return t.to(
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\quant_ops.py", line 205, in __torch_dispatch__
    return _GENERIC_UTILS[func](func, args, kwargs)
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\quant_ops.py", line 321, in generic_to_dtype_layout
    return _handle_device_transfer(
  File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\quant_ops.py", line 272, in _handle_device_transfer
    new_q_data = qt._qdata.to(device=target_device)
torch.AcceleratorError: CUDA error: invalid argument
Search for `cudaErrorInvalidValue' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Other

No response

Dec 06 '25 16:12 JohnneyTexas

Cuda einval on partially unload can be a product of using an old and incompatible version of GGUF loaders. --disable-pinned-memory works around but can cost you a lot of performance. Make sure you are on the latest version of ComfyUi-GGUF if doing GGUF anywhere in the flow.

If it's not GGUF we need full information on the bug as these are hard to pinpoint. A full workflow and paste the entire log at least starting from "got prompt".

Dec 07 '25 21:12 rattus128

Not the OP but I had the same error. I have been unable to reproduce the error using a different GGUF Loader, in my case the Unet Loader (GGUF) custom node works just fine.

The problem loader is: name: loadergguf version: 2.6.5 cnr_id: gguf ue properties says: {"widget_ue_connectable":{},"input_ue_unconnectable":{},"version":"7.4.1"}

Is there a more detailed id I can report?

Update. The default loader that appears highlighted at the top of the list when clicking and searching for GGUF Loaders, immediately returned an error too. Version 2.7.5 looks like the default?

Dec 07 '25 23:12 jasonblack23

Cuda einval on partially unload can be a product of using an old and incompatible version of GGUF loaders. --disable-pinned-memory works around but can cost you a lot of performance. Make sure you are on the latest version of ComfyUi-GGUF if doing GGUF anywhere in the flow.

If it's not GGUF we need full information on the bug as these are hard to pinpoint. A full workflow and paste the entire log at least starting from "got prompt".

Thank you. And I apologise for the lack of explanation. The error occurs when processing moves from sampling to VAE decoding. That is, the sampling itself completes normally even when the error occurs. Furthermore, in the current situation, once an error occurs, it persists unless Comfy is restarted; disabling the node or change the model does not resolve it. Also, gguf is not being used.

In this test, after restarting Comfy following an error, it none triggered the error again, and time constraints prevented thorough testing. However, it might be causing instability where the error occurs or completes normally depending on memory usage.

In this test, the error occurred with the combination CFG=1, Guidance=3.5, euler + simple, with Negative Prompt. Yet, even with the same combination after restarting, it failed to reproduce and completed normally. Furthermore, whether using CFG>1.0 (with Negative Prompt) or CFG=1 (without Negative Prompt), errors sometimes occur and sometimes do not in either case. The model used in this test was the FP8 Full Model(*), but during normal usage, there is no particular difference in error occurrence whether using FP16 or FP8, or whether using the Full Model or the Pruned Model.

https://civitai.com/models/1032613?modelVersionId=1158144

[On Error]

got prompt 10:39:48 Requested to load Flux loaded partially; 8480.79 MB usable, 8372.35 MB loaded, 2980.68 MB offloaded, 108.02 MB buffer reserved, lowvram patches: 0 Unloaded partially: 765.10 MB freed, 324.06 MB remains loaded, 36.00 MB buffer reserved, lowvram patches: 0 100%|██████████| 30/30 [02:12<00:00, 4.42s/it] Requested to load AutoencodingEngine !!! Exception during processing !!! CUDA error: invalid argument Search for cudaErrorInvalidValue' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last): File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 515, in execute output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data) File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 329, in get_output_data return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data) File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 303, in _async_map_node_over_list await process_inputs(input_dict, i) File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 291, in process_inputs result = f(**inputs) File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\nodes.py", line 298, in decode images = vae.decode(samples["samples"]) File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\sd.py", line 774, in decode model_management.load_models_gpu([self.patcher], memory_required=memory_used, force_full_load=self.disable_offload) File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_management.py", line 671, in load_models_gpu free_memory(total_memory_required[device] * 1.1 + extra_mem, device) File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_management.py", line 603, in free_memory if current_loaded_models[i].model_unload(memory_to_free): File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_management.py", line 526, in model_unload freed = self.model.partially_unload(self.model.offload_device, memory_to_free) File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\model_patcher.py", line 904, in partially_unload m.to(device_to) File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1371, in to return self._apply(convert) File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\ops.py", line 639, in _apply self.register_parameter(key, torch.nn.Parameter(fn(param), requires_grad=False)) File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1357, in convert return t.to( File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\quant_ops.py", line 205, in torch_dispatch return _GENERIC_UTILS[func](func, args, kwargs) File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\quant_ops.py", line 321, in generic_to_dtype_layout return _handle_device_transfer( File "Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\comfy\quant_ops.py", line 272, in _handle_device_transfer new_q_data = qt._qdata.to(device=target_device) torch.AcceleratorError: CUDA error: invalid argument Search for cudaErrorInvalidValue' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

[On success-the first time]

got prompt 10:57:52 Using pytorch attention in VAE Using pytorch attention in VAE VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16 [MultiGPU Core Patching] text_encoder_device_patched returning device: cpu (current_text_encoder_device=cpu) Requested to load FluxClipModel_ loaded completely; 95367431640625005117571072.00 MB usable, 5013.38 MB loaded, full load: True CLIP/text encoder model load device: cpu, offload device: cpu, current: cpu, dtype: torch.float32 Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}} find model: Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\models\insightface\models\antelopev2\1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0 Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}} find model: Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\models\insightface\models\antelopev2\2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0 Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}} find model: Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\models\insightface\models\antelopev2\genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0 Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}} find model: Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\models\insightface\models\antelopev2\glintr100.onnx recognition ['None', 3, 112, 112] 127.5 127.5 Applied providers: ['CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}} find model: Z:\WorkSpace\StabilityMatrix\Data\Packages\ComfyUI\models\insightface\models\antelopev2\scrfd_10g_bnkps.onnx detection [1, 3, '?', '?'] 127.5 128.0 set det-size: (640, 640) Loaded EVA02-CLIP-L-14-336 model config. Shape of rope freq: torch.Size([576, 64]) Loading pretrained EVA02-CLIP-L-14-336 weights (eva_clip). incompatible_keys.missing_keys: ['visual.rope.freqs_cos', 'visual.rope.freqs_sin', 'visual.blocks.0.attn.rope.freqs_cos', 'visual.blocks.0.attn.rope.freqs_sin', 'visual.blocks.1.attn.rope.freqs_cos', 'visual.blocks.1.attn.rope.freqs_sin', 'visual.blocks.2.attn.rope.freqs_cos', 'visual.blocks.2.attn.rope.freqs_sin', 'visual.blocks.3.attn.rope.freqs_cos', 'visual.blocks.3.attn.rope.freqs_sin', 'visual.blocks.4.attn.rope.freqs_cos', 'visual.blocks.4.attn.rope.freqs_sin', 'visual.blocks.5.attn.rope.freqs_cos', 'visual.blocks.5.attn.rope.freqs_sin', 'visual.blocks.6.attn.rope.freqs_cos', 'visual.blocks.6.attn.rope.freqs_sin', 'visual.blocks.7.attn.rope.freqs_cos', 'visual.blocks.7.attn.rope.freqs_sin', 'visual.blocks.8.attn.rope.freqs_cos', 'visual.blocks.8.attn.rope.freqs_sin', 'visual.blocks.9.attn.rope.freqs_cos', 'visual.blocks.9.attn.rope.freqs_sin', 'visual.blocks.10.attn.rope.freqs_cos', 'visual.blocks.10.attn.rope.freqs_sin', 'visual.blocks.11.attn.rope.freqs_cos', 'visual.blocks.11.attn.rope.freqs_sin', 'visual.blocks.12.attn.rope.freqs_cos', 'visual.blocks.12.attn.rope.freqs_sin', 'visual.blocks.13.attn.rope.freqs_cos', 'visual.blocks.13.attn.rope.freqs_sin', 'visual.blocks.14.attn.rope.freqs_cos', 'visual.blocks.14.attn.rope.freqs_sin', 'visual.blocks.15.attn.rope.freqs_cos', 'visual.blocks.15.attn.rope.freqs_sin', 'visual.blocks.16.attn.rope.freqs_cos', 'visual.blocks.16.attn.rope.freqs_sin', 'visual.blocks.17.attn.rope.freqs_cos', 'visual.blocks.17.attn.rope.freqs_sin', 'visual.blocks.18.attn.rope.freqs_cos', 'visual.blocks.18.attn.rope.freqs_sin', 'visual.blocks.19.attn.rope.freqs_cos', 'visual.blocks.19.attn.rope.freqs_sin', 'visual.blocks.20.attn.rope.freqs_cos', 'visual.blocks.20.attn.rope.freqs_sin', 'visual.blocks.21.attn.rope.freqs_cos', 'visual.blocks.21.attn.rope.freqs_sin', 'visual.blocks.22.attn.rope.freqs_cos', 'visual.blocks.22.attn.rope.freqs_sin', 'visual.blocks.23.attn.rope.freqs_cos', 'visual.blocks.23.attn.rope.freqs_sin'] Loading PuLID-Flux model. Found quantization metadata version 1 Detected mixed precision quantization Using mixed precision operations model weight dtype torch.bfloat16, manual cast: torch.bfloat16 model_type FLUX unet unexpected: ['scaled_fp8'] Requested to load PulidFluxModel loaded completely; 95367431640625005117571072.00 MB usable, 1085.10 MB loaded, full load: True Requested to load Flux loaded partially; 8484.78 MB usable, 8376.65 MB loaded, 2976.38 MB offloaded, 108.02 MB buffer reserved, lowvram patches: 0 100%|██████████| 30/30 [02:01<00:00, 4.04s/it] Requested to load AutoencodingEngine Unloaded partially: 4089.86 MB freed, 4286.79 MB remains loaded, 162.11 MB buffer reserved, lowvram patches: 0 loaded completely; 855.80 MB usable, 159.87 MB loaded, full load: True Prompt executed in 256.18 seconds

[On success-the second time]

got prompt 11:03:27 loaded partially; 8480.79 MB usable, 8372.35 MB loaded, 2980.68 MB offloaded, 108.02 MB buffer reserved, lowvram patches: 0 100%|██████████| 30/30 [02:00<00:00, 4.00s/it] Requested to load AutoencodingEngine Unloaded partially: 4085.55 MB freed, 4286.79 MB remains loaded, 162.11 MB buffer reserved, lowvram patches: 0 loaded completely; 853.80 MB usable, 159.87 MB loaded, full load: True Prompt executed in 167.23 seconds

WK-Flux1-20251208-01.json

Dec 08 '25 03:12 JohnneyTexas