Crashing after "GGUF state dict: {'Q8_0': 304}"

Open softtaco1 opened this issue 1 year ago • 0 comments

My forge is crashing on what seems to be model loading. I get this in my cmd logs:

venv "C:\Forge\stable-diffusion-webui-forge\venv\Scripts\Python.exe"
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: f2.0.1v1.10.1-previous-471-g1e0e861b
Commit hash: 1e0e861b0de4b19326acfdbcb11a656adca8d57d
Launching Web UI with arguments: --pin-shared-memory --cuda-malloc --cuda-stream
Using cudaMallocAsync backend.
Total VRAM 8192 MB, total RAM 16045 MB
pytorch version: 2.3.1+cu121
Set vram state to: NORMAL_VRAM
Always pin shared GPU memory
Device: cuda:0 NVIDIA GeForce RTX 3070 Laptop GPU : cudaMallocAsync
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
CUDA Using Stream: True
Using pytorch cross attention
Using pytorch attention for VAE
ControlNet preprocessor location: C:\Forge\stable-diffusion-webui-forge\models\ControlNetPreprocessor
[-] ADetailer initialized. version: 24.8.0, num models: 10
2024-08-30 09:13:34,126 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'C:\\Forge\\stable-diffusion-webui-forge\\models\\Stable-diffusion\\flux1-dev-Q8_0.gguf', 'hash': 'b44b9b8a'}, 'additional_modules': ['C:\\Forge\\stable-diffusion-webui-forge\\models\\VAE\\ae.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\clip_l.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\t5xxl_fp16.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 22.1s (prepare environment: 5.8s, import torch: 7.9s, initialize shared: 0.2s, other imports: 0.6s, load scripts: 3.1s, create ui: 2.6s, gradio launch: 1.8s).
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': True}
[GPU Setting] You will use 87.50% GPU memory (7167.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation.
Environment vars changed: {'stream': False, 'inference_memory': 1025.0, 'pin_shared_memory': True}
[GPU Setting] You will use 87.49% GPU memory (7166.00 MB) to load weights, and use 12.51% GPU memory (1025.00 MB) to do matrix computation.
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': True}
[GPU Setting] You will use 87.50% GPU memory (7167.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation.
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': True}
[GPU Setting] You will use 87.50% GPU memory (7167.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation.
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': True}
[GPU Setting] You will use 87.50% GPU memory (7167.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation.
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 87.50% GPU memory (7167.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation.
Loading Model: {'checkpoint_info': {'filename': 'C:\\Forge\\stable-diffusion-webui-forge\\models\\Stable-diffusion\\flux1-dev-Q8_0.gguf', 'hash': 'b44b9b8a'}, 'additional_modules': ['C:\\Forge\\stable-diffusion-webui-forge\\models\\VAE\\ae.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\clip_l.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\t5xxl_fp16.safetensors'], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 196, 'text_encoder_2': 220, 'ignore': 0}
Using Default T5 Data Type: torch.float16
Using Detected UNet Type: gguf
Using pre-quant state dict!
GGUF state dict: {'Q8_0': 304}
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': True}
[GPU Setting] You will use 87.50% GPU memory (7167.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation.
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 87.50% GPU memory (7167.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation.

I get this error as well:

Screenshot 2024-08-30 091837

Repeating with different weighting params to ensure that's not the issue:

Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: f2.0.1v1.10.1-previous-471-g1e0e861b
Commit hash: 1e0e861b0de4b19326acfdbcb11a656adca8d57d
Launching Web UI with arguments: --pin-shared-memory --cuda-malloc --cuda-stream
Using cudaMallocAsync backend.
Total VRAM 8192 MB, total RAM 16045 MB
pytorch version: 2.3.1+cu121
Set vram state to: NORMAL_VRAM
Always pin shared GPU memory
Device: cuda:0 NVIDIA GeForce RTX 3070 Laptop GPU : cudaMallocAsync
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
CUDA Using Stream: True
Using pytorch cross attention
Using pytorch attention for VAE
ControlNet preprocessor location: C:\Forge\stable-diffusion-webui-forge\models\ControlNetPreprocessor
[-] ADetailer initialized. version: 24.8.0, num models: 10
2024-08-30 09:36:01,400 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'C:\\Forge\\stable-diffusion-webui-forge\\models\\Stable-diffusion\\flux1-dev-Q8_0.gguf', 'hash': 'b44b9b8a'}, 'additional_modules': ['C:\\Forge\\stable-diffusion-webui-forge\\models\\VAE\\ae.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\clip_l.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\t5xxl_fp16.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 21.0s (prepare environment: 5.6s, import torch: 8.0s, initialize shared: 0.1s, other imports: 0.5s, load scripts: 2.5s, create ui: 2.4s, gradio launch: 1.6s).
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 87.50% GPU memory (7167.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation.
Environment vars changed: {'stream': False, 'inference_memory': 2048.0, 'pin_shared_memory': False}
[GPU Setting] You will use 75.00% GPU memory (6143.00 MB) to load weights, and use 25.00% GPU memory (2048.00 MB) to do matrix computation.
Environment vars changed: {'stream': False, 'inference_memory': 2048.0, 'pin_shared_memory': False}
[GPU Setting] You will use 75.00% GPU memory (6143.00 MB) to load weights, and use 25.00% GPU memory (2048.00 MB) to do matrix computation.
Loading Model: {'checkpoint_info': {'filename': 'C:\\Forge\\stable-diffusion-webui-forge\\models\\Stable-diffusion\\flux1-dev-Q8_0.gguf', 'hash': 'b44b9b8a'}, 'additional_modules': ['C:\\Forge\\stable-diffusion-webui-forge\\models\\VAE\\ae.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\clip_l.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\t5xxl_fp16.safetensors'], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 196, 'text_encoder_2': 220, 'ignore': 0}
Using Default T5 Data Type: torch.float16
Using Detected UNet Type: gguf
Using pre-quant state dict!
GGUF state dict: {'Q8_0': 304}
Press any key to continue . . .

Webui settings: Screenshot 2024-08-30 093837

Task manager performance tab showing decrease in usage after crash, but also strangely no gpu usage: Screenshot 2024-08-30 094210

EDIT:

It works with smaller models. (Q5_1 and fp8 t5 encoder) I believe the issue is it says loading to CPU is a "blocked method" which I didn't notice receiving prior. Is there any way to correct this? See below:

venv "C:\Forge\stable-diffusion-webui-forge\venv\Scripts\Python.exe"
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: f2.0.1v1.10.1-previous-471-g1e0e861b
Commit hash: 1e0e861b0de4b19326acfdbcb11a656adca8d57d
Launching Web UI with arguments: --cuda-malloc
Using cudaMallocAsync backend.
Total VRAM 8192 MB, total RAM 16045 MB
pytorch version: 2.3.1+cu121
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3070 Laptop GPU : cudaMallocAsync
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
CUDA Using Stream: False
Using pytorch cross attention
Using pytorch attention for VAE
ControlNet preprocessor location: C:\Forge\stable-diffusion-webui-forge\models\ControlNetPreprocessor
[-] ADetailer initialized. version: 24.8.0, num models: 10
2024-08-30 10:18:31,584 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'C:\\Forge\\stable-diffusion-webui-forge\\models\\Stable-diffusion\\flux1-dev-Q8_0.gguf', 'hash': 'b44b9b8a'}, 'additional_modules': ['C:\\Forge\\stable-diffusion-webui-forge\\models\\VAE\\ae.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\clip_l.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\t5xxl_fp16.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 21.2s (prepare environment: 4.9s, import torch: 7.2s, initialize shared: 0.1s, other imports: 0.5s, list SD models: 1.3s, load scripts: 2.6s, create ui: 2.4s, gradio launch: 2.0s).
Environment vars changed: {'stream': False, 'inference_memory': 2048.0, 'pin_shared_memory': False}
[GPU Setting] You will use 75.00% GPU memory (6143.00 MB) to load weights, and use 25.00% GPU memory (2048.00 MB) to do matrix computation.
Model selected: {'checkpoint_info': {'filename': 'C:\\Forge\\stable-diffusion-webui-forge\\models\\Stable-diffusion\\flux1-dev-Q8_0.gguf', 'hash': 'b44b9b8a'}, 'additional_modules': ['C:\\Forge\\stable-diffusion-webui-forge\\models\\VAE\\ae.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\clip_l.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Model selected: {'checkpoint_info': {'filename': 'C:\\Forge\\stable-diffusion-webui-forge\\models\\Stable-diffusion\\flux1-dev-Q5_1.gguf', 'hash': '3dabcacf'}, 'additional_modules': ['C:\\Forge\\stable-diffusion-webui-forge\\models\\VAE\\ae.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\clip_l.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Model selected: {'checkpoint_info': {'filename': 'C:\\Forge\\stable-diffusion-webui-forge\\models\\Stable-diffusion\\flux1-dev-Q5_1.gguf', 'hash': '3dabcacf'}, 'additional_modules': ['C:\\Forge\\stable-diffusion-webui-forge\\models\\VAE\\ae.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\clip_l.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\t5xxl_fp8_e4m3fn.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Environment vars changed: {'stream': False, 'inference_memory': 2048.0, 'pin_shared_memory': True}
[GPU Setting] You will use 75.00% GPU memory (6143.00 MB) to load weights, and use 25.00% GPU memory (2048.00 MB) to do matrix computation.
Environment vars changed: {'stream': False, 'inference_memory': 2048.0, 'pin_shared_memory': False}
[GPU Setting] You will use 75.00% GPU memory (6143.00 MB) to load weights, and use 25.00% GPU memory (2048.00 MB) to do matrix computation.
Environment vars changed: {'stream': True, 'inference_memory': 2048.0, 'pin_shared_memory': False}
[GPU Setting] You will use 75.00% GPU memory (6143.00 MB) to load weights, and use 25.00% GPU memory (2048.00 MB) to do matrix computation.
Environment vars changed: {'stream': False, 'inference_memory': 2048.0, 'pin_shared_memory': False}
[GPU Setting] You will use 75.00% GPU memory (6143.00 MB) to load weights, and use 25.00% GPU memory (2048.00 MB) to do matrix computation.
Model selected: {'checkpoint_info': {'filename': 'C:\\Forge\\stable-diffusion-webui-forge\\models\\Stable-diffusion\\flux1-dev-Q5_1.gguf', 'hash': '3dabcacf'}, 'additional_modules': ['C:\\Forge\\stable-diffusion-webui-forge\\models\\VAE\\ae.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\clip_l.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\t5xxl_fp8_e4m3fn.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: True
Model selected: {'checkpoint_info': {'filename': 'C:\\Forge\\stable-diffusion-webui-forge\\models\\Stable-diffusion\\flux1-dev-Q5_1.gguf', 'hash': '3dabcacf'}, 'additional_modules': ['C:\\Forge\\stable-diffusion-webui-forge\\models\\VAE\\ae.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\clip_l.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\t5xxl_fp8_e4m3fn.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Loading Model: {'checkpoint_info': {'filename': 'C:\\Forge\\stable-diffusion-webui-forge\\models\\Stable-diffusion\\flux1-dev-Q5_1.gguf', 'hash': '3dabcacf'}, 'additional_modules': ['C:\\Forge\\stable-diffusion-webui-forge\\models\\VAE\\ae.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\clip_l.safetensors', 'C:\\Forge\\stable-diffusion-webui-forge\\models\\text_encoder\\t5xxl_fp8_e4m3fn.safetensors'], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 196, 'text_encoder_2': 220, 'ignore': 0}
Using Detected T5 Data Type: torch.float8_e4m3fn
Using Detected UNet Type: gguf
Using pre-quant state dict!
GGUF state dict: {'Q5_1': 304}
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': 'gguf', 'computation_dtype': torch.bfloat16}
Model loaded in 75.4s (unload existing model: 0.2s, forge model load: 75.1s).
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
[Unload] Trying to free 8747.54 MB for cuda:0 with 0 models keep loaded ... Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 7104.60 MB, Model Require: 5153.49 MB, Inference Require: 2048.00 MB, Remaining: -96.90 MB, CPU Swap Loaded (blocked method): 1374.38 MB, GPU Loaded: 3851.49 MB
Moving model(s) has taken 5.01 seconds
Distilled CFG Scale: 3
[Unload] Trying to free 13218.47 MB for cuda:0 with 0 models keep loaded ... Current free memory is 2630.01 MB ... Unload model JointTextEncoder Done.
[Memory Management] Target: KModel, Free GPU: 7055.99 MB, Model Require: 8592.67 MB, Inference Require: 2048.00 MB, Remaining: -3584.68 MB, CPU Swap Loaded (blocked method): 4761.00 MB, GPU Loaded: 3831.67 MB
Moving model(s) has taken 19.27 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:56<00:00,  2.83s/it]
[Unload] Trying to free 2255.84 MB for cuda:0 with 0 models keep loaded ... Current free memory is 3201.73 MB ... Done.
[Memory Management] Target: IntegratedAutoencoderKL, Free GPU: 3201.73 MB, Model Require: 159.87 MB, Inference Require: 2048.00 MB, Remaining: 993.86 MB, All loaded to GPU.
Moving model(s) has taken 0.28 seconds
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:53<00:00,  2.67s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:53<00:00,  2.76s/it]

Aug 30 '24 14:08 softtaco1