nunchaku Injecting quantized module !!! Exception during processing !!! CUDA error: out of memory Compile with `TORCH_USE_CUDA

please help whith this, i tried reinstall nunchaku and deepcompressor, just the same error

Using xformers attention in VAE VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16 2025-04-07 01:58:30.960384 GPU 0 (NVIDIA GeForce RTX 3070) Memory: 8191.375 MiB 2025-04-07 01:58:30.960884 VRAM < 14GiB，enable CPU offload [2025-04-07 01:58:31.623] [info] Initializing QuantizedFluxModel on device 0 [2025-04-07 01:58:31.623] [info] Layer offloading enabled [2025-04-07 01:58:31.701] [info] Loading weights from S:\ComfyUI_windows_portable\ComfyUI\models\diffusion_models\svdq-int4-flux.1-dev\transformer_blocks.safetensors [2025-04-07 01:58:31.714] [info] Done. 2025-04-07 01:58:31.714391 Injecting quantized module !!! Exception during processing !!! CUDA error: out of memory Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Traceback (most recent call last): File "S:\ComfyUI_windows_portable\ComfyUI\execution.py", line 327, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "S:\ComfyUI_windows_portable\ComfyUI\execution.py", line 202, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "S:\ComfyUI_windows_portable\ComfyUI\execution.py", line 174, in _map_node_over_list process_inputs(input_dict, i) File "S:\ComfyUI_windows_portable\ComfyUI\execution.py", line 163, in process_inputs results.append(getattr(obj, func)(**inputs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "S:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-nunchaku\nodes\models\flux.py", line 280, in load_model self.transformer = NunchakuFluxTransformer2dModel.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\huggingface_hub\utils_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\nunchaku\models\transformers\transformer_flux.py", line 306, in from_pretrained transformer.to_empty(device=device) File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1192, in to_empty return self._apply( ^^^^^^^^^^^^ File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 900, in _apply module._apply(fn) File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 900, in _apply module._apply(fn) File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 900, in _apply module._apply(fn) File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 927, in _apply param_applied = fn(param) ^^^^^^^^^ File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1193, in lambda t: torch.empty_like(t, device=device), recurse=recurse ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch_prims_common\wrappers.py", line 273, in fn result = fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch_refs_init.py", line 4919, in empty_like return torch.empty_permuted( ^^^^^^^^^^^^^^^^^^^^^ RuntimeError: CUDA error: out of memory Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

File "S:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 206, in get_total_memory _, mem_total_cuda = torch.cuda.mem_get_info(dev) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\cuda\memory.py", line 712, in mem_get_info return torch.cuda.cudart().cudaMemGetInfo(device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: CUDA error: out of memory Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

S:\ComfyUI_windows_portable>pause Press any key to continue . . .

Apr 06 '25 18:04 usupehmat

I'm running into the same error pretty consistently with a 24gb 4090 as well, which never had problems before 2.0. It's hit-or-miss when it triggers, but it seems to refuse to process once it triggers (even after restarting ComfyUI), constantly saying it's running out of memory when loading.

Not sure what's going on, weird that sometimes it happens and sometimes it doesn't - also, watching active VRAM use when it happens, my GPU actually only goes up to about 20 - 25% VRAM use when the error occurs.

Usually triggers on the Nunchaku FLUX DiT Loader, occasionally on the Nunchaku Text Encoder Loader. Waiting for an extended period (15 - 20 min) without trying to run/queue anything seems to fix the issue for a while?

Currently not having the issue, will try to copy-paste my log here as well next time it comes up (probably won't be more than an hour before it shows up again).

Apr 06 '25 22:04 Beowulfe222

OOM errors occur frequently without any apparent reason. Some of my workflows are quite complex and use larger batch sizes, yet they run fine. However, simpler workflows with smaller batch sizes sometimes trigger out-of-memory (OOM) errors or shared memory issues, which drastically slow down sampling.

In comparison, FP8 or even FP16 Flux don’t show this way (Maybe ComfyUI optimizes memory usage by automatically offloading when needed). This is a major issue because I expected the SVDQ technique to save time—but if OOM errors happen randomly and often, it defeats the purpose.

Apr 08 '25 08:04 AiD-teng

Could you upgrade your CUDA driver version and try setting the environment variable NUNCHAKU_LOAD_METHOD to READ or READNOPIN to see if the error persists?

For reference, see the related issue and PR: #206 #311 #276.

Apr 26 '25 04:04 lmxyy

I'll close this issue for now. If the error persists or you encounter any other problems, feel free to open a new issue.

Apr 27 '25 03:04 lmxyy

nunchaku
nunchaku copied to clipboard

Injecting quantized module !!! Exception during processing !!! CUDA error: out of memory Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

nunchaku nunchaku copied to clipboard

Injecting quantized module !!! Exception during processing !!! CUDA error: out of memory Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

nunchaku
nunchaku copied to clipboard