nunchaku
nunchaku copied to clipboard
Injecting quantized module !!! Exception during processing !!! CUDA error: out of memory Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
please help whith this, i tried reinstall nunchaku and deepcompressor, just the same error
Using xformers attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
2025-04-07 01:58:30.960384 GPU 0 (NVIDIA GeForce RTX 3070) Memory: 8191.375 MiB
2025-04-07 01:58:30.960884 VRAM < 14GiB,enable CPU offload
[2025-04-07 01:58:31.623] [info] Initializing QuantizedFluxModel on device 0
[2025-04-07 01:58:31.623] [info] Layer offloading enabled
[2025-04-07 01:58:31.701] [info] Loading weights from S:\ComfyUI_windows_portable\ComfyUI\models\diffusion_models\svdq-int4-flux.1-dev\transformer_blocks.safetensors
[2025-04-07 01:58:31.714] [info] Done.
2025-04-07 01:58:31.714391 Injecting quantized module
!!! Exception during processing !!! CUDA error: out of memory
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Traceback (most recent call last):
File "S:\ComfyUI_windows_portable\ComfyUI\execution.py", line 327, in execute
output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "S:\ComfyUI_windows_portable\ComfyUI\execution.py", line 202, in get_output_data
return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "S:\ComfyUI_windows_portable\ComfyUI\execution.py", line 174, in _map_node_over_list
process_inputs(input_dict, i)
File "S:\ComfyUI_windows_portable\ComfyUI\execution.py", line 163, in process_inputs
results.append(getattr(obj, func)(**inputs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "S:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-nunchaku\nodes\models\flux.py", line 280, in load_model
self.transformer = NunchakuFluxTransformer2dModel.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\huggingface_hub\utils_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\nunchaku\models\transformers\transformer_flux.py", line 306, in from_pretrained
transformer.to_empty(device=device)
File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1192, in to_empty
return self._apply(
^^^^^^^^^^^^
File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
module._apply(fn)
File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
module._apply(fn)
File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 900, in _apply
module._apply(fn)
File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 927, in _apply
param_applied = fn(param)
^^^^^^^^^
File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1193, in TORCH_USE_CUDA_DSA to enable device-side assertions.
File "S:\ComfyUI_windows_portable\ComfyUI\comfy\model_management.py", line 206, in get_total_memory
_, mem_total_cuda = torch.cuda.mem_get_info(dev)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "S:\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\cuda\memory.py", line 712, in mem_get_info
return torch.cuda.cudart().cudaMemGetInfo(device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: out of memory
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
S:\ComfyUI_windows_portable>pause Press any key to continue . . .
I'm running into the same error pretty consistently with a 24gb 4090 as well, which never had problems before 2.0. It's hit-or-miss when it triggers, but it seems to refuse to process once it triggers (even after restarting ComfyUI), constantly saying it's running out of memory when loading.
Not sure what's going on, weird that sometimes it happens and sometimes it doesn't - also, watching active VRAM use when it happens, my GPU actually only goes up to about 20 - 25% VRAM use when the error occurs.
Usually triggers on the Nunchaku FLUX DiT Loader, occasionally on the Nunchaku Text Encoder Loader. Waiting for an extended period (15 - 20 min) without trying to run/queue anything seems to fix the issue for a while?
Currently not having the issue, will try to copy-paste my log here as well next time it comes up (probably won't be more than an hour before it shows up again).
OOM errors occur frequently without any apparent reason. Some of my workflows are quite complex and use larger batch sizes, yet they run fine. However, simpler workflows with smaller batch sizes sometimes trigger out-of-memory (OOM) errors or shared memory issues, which drastically slow down sampling.
In comparison, FP8 or even FP16 Flux don’t show this way (Maybe ComfyUI optimizes memory usage by automatically offloading when needed). This is a major issue because I expected the SVDQ technique to save time—but if OOM errors happen randomly and often, it defeats the purpose.
Could you upgrade your CUDA driver version and try setting the environment variable NUNCHAKU_LOAD_METHOD to READ or READNOPIN to see if the error persists?
For reference, see the related issue and PR: #206 #311 #276.
I'll close this issue for now. If the error persists or you encounter any other problems, feel free to open a new issue.