(Solved but slow with the daily update ?) [OOM with Neveroom checked] RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. Time taken: 45.4 sec. A: 0.07 GB, R: 0.09 GB, Sys: 0.9/4 GB (22.3%)

Open Giribot opened this issue 1 year ago • 1 comments
Hello ! Forge with flux does'nt work this morning. but yesterday: yes. Neveroom seems not working (???)
Thanks !
On My HDD (D:), i have 110go free .....
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: f2.0.1v1.10.1-previous-527-g720b80da
Commit hash: 720b80daea9070aa7896d8ce2c4a6cf8daf927cf
D:\Data\Packages\Stable Diffusion WebUI Forge\extensions-builtin\forge_legacy_preprocessors\install.py:2: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  import pkg_resources
D:\Data\Packages\Stable Diffusion WebUI Forge\extensions-builtin\sd_forge_controlnet\install.py:2: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  import pkg_resources
Launching Web UI with arguments: --share --cuda-malloc --cuda-stream --gradio-allowed-path 'D:\Data\Images'
Using cudaMallocAsync backend.
Total VRAM 4096 MB, total RAM 20226 MB
pytorch version: 2.3.1+cu121
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3050 Ti Laptop GPU : cudaMallocAsync
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
CUDA Using Stream: True
Using pytorch cross attention
Using pytorch attention for VAE
ControlNet preprocessor location: D:\Data\Packages\Stable Diffusion WebUI Forge\models\ControlNetPreprocessor
2024-09-12 08:37:08,249 - ControlNet - INFO - ControlNet UI callback registered.
*** Error executing callback ui_tabs_callback for D:\Data\Packages\Stable Diffusion WebUI Forge\extensions\model_preset_manager\scripts\main.py
    Traceback (most recent call last):
      File "D:\Data\Packages\Stable Diffusion WebUI Forge\modules\script_callbacks.py", line 283, in ui_tabs_callback
        res += c.callback() or []
      File "D:\Data\Packages\Stable Diffusion WebUI Forge\extensions\model_preset_manager\scripts\main.py", line 463, in on_ui_tabs
        model_generation_data = gr.Textbox(label = model_generation_data_label_text(), value = "", lines = 3, elem_id = "def_model_gen_data_textbox").style(show_copy_button=True)
    AttributeError: 'Textbox' object has no attribute 'style'

---
Model selected: {'checkpoint_info': {'filename': 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\Stable-diffusion\\fenrisxlFlux_fenrisFluxV1.safetensors', 'hash': 'c90651e3'}, 'additional_modules': ['D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\VAE\\ae.safetensors', 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\t5xxl_fp16.safetensors', 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\clip_l.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://e052dd2a3ac129e222.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
Startup time: 35.9s (prepare environment: 7.7s, import torch: 11.2s, initialize shared: 0.2s, other imports: 0.9s, list SD models: 0.1s, load scripts: 3.6s, create ui: 2.6s, gradio launch: 9.6s).
Environment vars changed: {'stream': False, 'inference_memory': 950.0, 'pin_shared_memory': True}
[GPU Setting] You will use 76.80% GPU memory (3145.00 MB) to load weights, and use 23.20% GPU memory (950.00 MB) to do matrix computation.
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 74.99% GPU memory (3071.00 MB) to load weights, and use 25.01% GPU memory (1024.00 MB) to do matrix computation.
Environment vars changed: {'stream': False, 'inference_memory': 950.0, 'pin_shared_memory': True}
[GPU Setting] You will use 76.80% GPU memory (3145.00 MB) to load weights, and use 23.20% GPU memory (950.00 MB) to do matrix computation.
Model selected: {'checkpoint_info': {'filename': 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\Stable-diffusion\\flux1DevHyperNF4Flux1DevBNB_flux1DevHyperNF4.safetensors', 'hash': 'a005585e'}, 'additional_modules': ['D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\VAE\\ae.safetensors', 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\t5xxl_fp16.safetensors', 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\clip_l.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Model selected: {'checkpoint_info': {'filename': 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\Stable-diffusion\\flux1DevHyperNF4Flux1DevBNB_flux1DevHyperNF4.safetensors', 'hash': 'a005585e'}, 'additional_modules': ['D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\t5xxl_fp16.safetensors', 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\clip_l.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Model selected: {'checkpoint_info': {'filename': 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\Stable-diffusion\\flux1DevHyperNF4Flux1DevBNB_flux1DevHyperNF4.safetensors', 'hash': 'a005585e'}, 'additional_modules': ['D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\t5xxl_fp16.safetensors', 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\clip_l.safetensors', 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\VAE\\ae.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Model selected: {'checkpoint_info': {'filename': 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\Stable-diffusion\\flux1DevHyperNF4Flux1DevBNB_flux1DevHyperNF4.safetensors', 'hash': 'a005585e'}, 'additional_modules': ['D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\t5xxl_fp16.safetensors', 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\VAE\\ae.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Model selected: {'checkpoint_info': {'filename': 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\Stable-diffusion\\flux1DevHyperNF4Flux1DevBNB_flux1DevHyperNF4.safetensors', 'hash': 'a005585e'}, 'additional_modules': ['D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\t5xxl_fp16.safetensors', 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\VAE\\ae.safetensors', 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\clip_l.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Model selected: {'checkpoint_info': {'filename': 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\Stable-diffusion\\flux1DevHyperNF4Flux1DevBNB_flux1DevHyperNF4.safetensors', 'hash': 'a005585e'}, 'additional_modules': ['D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\VAE\\ae.safetensors', 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\clip_l.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Model selected: {'checkpoint_info': {'filename': 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\Stable-diffusion\\flux1DevHyperNF4Flux1DevBNB_flux1DevHyperNF4.safetensors', 'hash': 'a005585e'}, 'additional_modules': ['D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\VAE\\ae.safetensors', 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\clip_l.safetensors', 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\t5xxl_fp16.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Model selected: {'checkpoint_info': {'filename': 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\Stable-diffusion\\flux1DevHyperNF4Flux1DevBNB_flux1DevHyperNF4.safetensors', 'hash': 'a005585e'}, 'additional_modules': ['D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\VAE\\ae.safetensors', 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\clip_l.safetensors', 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\t5xxl_fp16.safetensors'], 'unet_storage_dtype': 'nf4'}
Using online LoRAs in FP16: False
Environment vars changed: {'stream': True, 'inference_memory': 950.0, 'pin_shared_memory': True}
[GPU Setting] You will use 76.80% GPU memory (3145.00 MB) to load weights, and use 23.20% GPU memory (950.00 MB) to do matrix computation.
Loading Model: {'checkpoint_info': {'filename': 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\Stable-diffusion\\flux1DevHyperNF4Flux1DevBNB_flux1DevHyperNF4.safetensors', 'hash': 'a005585e'}, 'additional_modules': ['D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\VAE\\ae.safetensors', 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\clip_l.safetensors', 'D:\\Data\\Packages\\Stable Diffusion WebUI Forge\\models\\text_encoder\\t5xxl_fp16.safetensors'], 'unet_storage_dtype': 'nf4'}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
StateDict Keys: {'transformer': 1722, 'vae': 244, 'text_encoder': 196, 'text_encoder_2': 220, 'ignore': 0}
Using Default T5 Data Type: torch.float16
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': 'nf4', 'computation_dtype': torch.bfloat16}
Model loaded in 4.6s (unload existing model: 0.2s, forge model load: 4.3s).
NeverOOM Enabled for UNet (always maximize offload)
NeverOOM Enabled for VAE (always tiled)
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
VARM State Changed To NO_VRAM
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
[Unload] Trying to free 13390.34 MB for cuda:0 with 0 models keep loaded ... Done.
Traceback (most recent call last):
  File "D:\Data\Packages\Stable Diffusion WebUI Forge\modules_forge\main_thread.py", line 30, in work
    self.result = self.func(*self.args, **self.kwargs)
  File "D:\Data\Packages\Stable Diffusion WebUI Forge\modules\txt2img.py", line 121, in txt2img_function
    processed = processing.process_images(p)
  File "D:\Data\Packages\Stable Diffusion WebUI Forge\modules\processing.py", line 816, in process_images
    res = process_images_inner(p)
  File "D:\Data\Packages\Stable Diffusion WebUI Forge\modules\processing.py", line 929, in process_images_inner
    p.setup_conds()
  File "D:\Data\Packages\Stable Diffusion WebUI Forge\modules\processing.py", line 1519, in setup_conds
    super().setup_conds()
  File "D:\Data\Packages\Stable Diffusion WebUI Forge\modules\processing.py", line 501, in setup_conds
    self.c = self.get_conds_with_caching(prompt_parser.get_multicond_learned_conditioning, prompts, total_steps, [self.cached_c], self.extra_network_data)
  File "D:\Data\Packages\Stable Diffusion WebUI Forge\modules\processing.py", line 470, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps, hires_steps, shared.opts.use_old_scheduling)
  File "D:\Data\Packages\Stable Diffusion WebUI Forge\modules\prompt_parser.py", line 262, in get_multicond_learned_conditioning
    learned_conditioning = get_learned_conditioning(model, prompt_flat_list, steps, hires_steps, use_old_scheduling)
  File "D:\Data\Packages\Stable Diffusion WebUI Forge\modules\prompt_parser.py", line 189, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "D:\Data\Packages\Stable Diffusion WebUI Forge\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\Data\Packages\Stable Diffusion WebUI Forge\backend\diffusion_engine\flux.py", line 77, in get_learned_conditioning
    memory_management.load_model_gpu(self.forge_objects.clip.patcher)
  File "D:\Data\Packages\Stable Diffusion WebUI Forge\backend\memory_management.py", line 689, in load_model_gpu
    return load_models_gpu([model])
  File "D:\Data\Packages\Stable Diffusion WebUI Forge\backend\memory_management.py", line 679, in load_models_gpu
    loaded_model.model_load(model_gpu_memory_when_using_cpu_swap)
  File "D:\Data\Packages\Stable Diffusion WebUI Forge\backend\memory_management.py", line 492, in model_load
    m._apply(lambda x: x.pin_memory())
  File "D:\Data\Packages\Stable Diffusion WebUI Forge\venv\lib\site-packages\torch\nn\modules\module.py", line 804, in _apply
    param_applied = fn(param)
  File "D:\Data\Packages\Stable Diffusion WebUI Forge\backend\memory_management.py", line 492, in <lambda>
    m._apply(lambda x: x.pin_memory())
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

`
FireShot Capture 002 - Stable Diffusion - 127 0 0 1
Sep 12 '24 06:09 Giribot
Update: After the daily update of Forge (here, now), and choose in "Swap Location" [X] CPU [_] Shared" It's working again ... But it's very slow to download the checkpoint....
Sep 12 '24 10:09 Giribot