Issues with cuda out of memory

Open CimaCha opened this issue 5 months ago • 1 comments
I have cuda out of memory on 4070 super

I've worked with this ui for year and it was fine But after some updates (I don't understand exactly) I started to watch this problem and log didn't help me well
To create a public link, set `share=True` in `launch()`.
Startup time: 14.5s (prepare environment: 2.4s, launcher: 0.4s, import torch: 6.1s, initialize shared: 0.1s, other imports: 0.2s, load scripts: 1.6s, create ui: 2.1s, gradio launch: 1.5s).
Environment vars changed: {'stream': True, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 91.66% GPU memory (11257.00 MB) to load weights, and use 8.34% GPU memory (1024.00 MB) to do matrix computation.
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 91.66% GPU memory (11257.00 MB) to load weights, and use 8.34% GPU memory (1024.00 MB) to do matrix computation.
Environment vars changed: {'stream': True, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 91.66% GPU memory (11257.00 MB) to load weights, and use 8.34% GPU memory (1024.00 MB) to do matrix computation.
Loading Model: {'checkpoint_info': {'filename': 'D:\\Forge\\webui\\models\\Stable-diffusion\\flux1-dev-bnb-nf4-v2.safetensors', 'hash': 'f0770152'}, 'additional_modules': [], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
StateDict Keys: {'transformer': 1722, 'vae': 244, 'text_encoder': 198, 'text_encoder_2': 220, 'ignore': 0}
Using Detected T5 Data Type: torch.float8_e4m3fn
Using Detected UNet Type: nf4
Using pre-quant state dict!
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': 'nf4', 'computation_dtype': torch.bfloat16}
Model loaded in 0.7s (unload existing model: 0.2s, forge model load: 0.5s).
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
[Unload] Trying to free 7725.00 MB for cuda:0 with 0 models keep loaded ... Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 11041.00 MB, Model Require: 5154.62 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 4862.38 MB, All loaded to GPU.
Moving model(s) has taken 2.46 seconds
Traceback (most recent call last):
  File "D:\Forge\webui\modules_forge\main_thread.py", line 30, in work
    self.result = self.func(*self.args, **self.kwargs)
  File "D:\Forge\webui\modules\txt2img.py", line 131, in txt2img_function
    processed = processing.process_images(p)
  File "D:\Forge\webui\modules\processing.py", line 842, in process_images
    res = process_images_inner(p)
  File "D:\Forge\webui\modules\processing.py", line 962, in process_images_inner
    p.setup_conds()
  File "D:\Forge\webui\modules\processing.py", line 1601, in setup_conds
    super().setup_conds()
  File "D:\Forge\webui\modules\processing.py", line 505, in setup_conds
    self.c = self.get_conds_with_caching(prompt_parser.get_multicond_learned_conditioning, prompts, total_steps, [self.cached_c], self.extra_network_data)
  File "D:\Forge\webui\modules\processing.py", line 474, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps, hires_steps, shared.opts.use_old_scheduling)
  File "D:\Forge\webui\modules\prompt_parser.py", line 262, in get_multicond_learned_conditioning
    learned_conditioning = get_learned_conditioning(model, prompt_flat_list, steps, hires_steps, use_old_scheduling)
  File "D:\Forge\webui\modules\prompt_parser.py", line 189, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "D:\Forge\system\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\Forge\webui\backend\diffusion_engine\flux.py", line 87, in get_learned_conditioning
    cond_t5 = self.text_processing_engine_t5(prompt)
  File "D:\Forge\webui\backend\text_processing\t5_engine.py", line 139, in __call__
    z = self.process_tokens([tokens], [multipliers])[0]
  File "D:\Forge\webui\backend\text_processing\t5_engine.py", line 150, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "D:\Forge\webui\backend\text_processing\t5_engine.py", line 62, in encode_with_transformers
    z = self.text_encoder(
  File "D:\Forge\system\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Forge\system\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Forge\webui\backend\nn\t5.py", line 207, in forward
    return self.encoder(x, *args, **kwargs)
  File "D:\Forge\system\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Forge\system\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Forge\webui\backend\nn\t5.py", line 188, in forward
    x, past_bias = l(x, mask, past_bias)
  File "D:\Forge\system\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Forge\system\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Forge\webui\backend\nn\t5.py", line 164, in forward
    x, past_bias = self.layer[0](x, mask, past_bias)
  File "D:\Forge\system\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Forge\system\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Forge\webui\backend\nn\t5.py", line 151, in forward
    output, past_bias = self.SelfAttention(self.layer_norm(x), mask=mask, past_bias=past_bias)
  File "D:\Forge\system\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Forge\system\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Forge\webui\backend\nn\t5.py", line 129, in forward
    k = self.k(x)
  File "D:\Forge\system\python\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Forge\system\python\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Forge\webui\backend\operations.py", line 151, in forward
    weight, bias, signal = weights_manual_cast(self, x)
  File "D:\Forge\webui\backend\operations.py", line 79, in weights_manual_cast
    weight, bias = get_weight_and_bias(layer, weight_args, bias_args, weight_fn=weight_fn, bias_fn=bias_fn)
  File "D:\Forge\webui\backend\operations.py", line 35, in get_weight_and_bias
    weight = weight.to(**weight_args)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU
CUDA out of memory. Tried to allocate 64.00 MiB. GPU
What I can do with this?
Jul 06 '25 22:07 CimaCha
Reduce the GPU Weights slider at the top. The model is only asking for~6GB for the weights, but its set to allow up to 11GB leaving you with 1GB (minus whatever your OS takes) for inference. Set the value to 7,000-8,000ish and it should work ok.
Jul 11 '25 20:07 MisterChief95