Patching lora fixed but not the generation
When using lora, the crash happens at the very end of the generation. PC: 3060 12gb, 32RAM I use standard forge settings and swap location shared (switching to cpu did't help)
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f2.0.1v1.10.1-previous-323-g72ab92f8
Commit hash: 72ab92f83e5a9e193726313c6d88ab435a61fb59
Launching Web UI with arguments:
Total VRAM 12287 MB, total RAM 32692 MB
pytorch version: 2.4.0+cu124
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3060 : native
Hint: your device supports --cuda-malloc for potential speed improvements.
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
CUDA Using Stream: False
C:\AI\webui_forge_cu124_torch24\system\python\lib\site-packages\transformers\utils\hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
Using pytorch cross attention
Using pytorch attention for VAE
ControlNet preprocessor location: C:\AI\webui_forge_cu124_torch24\webui\models\ControlNetPreprocessor
2024-08-18 11:53:00,451 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'C:\\AI\\webui_forge_cu124_torch24\\webui\\models\\Stable-diffusion\\flux1-dev-fp8.safetensors', 'hash': 'be9881f4'}, 'additional_modules': [], 'unet_storage_dtype': None}
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
Startup time: 14.5s (prepare environment: 2.7s, import torch: 6.1s, initialize shared: 0.1s, other imports: 0.9s, load scripts: 1.4s, create ui: 2.0s, gradio launch: 1.2s).
Environment vars changed: {'stream': False, 'inference_memory': 1787.0, 'pin_shared_memory': True}
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
Environment vars changed: {'stream': False, 'inference_memory': 1787.0, 'pin_shared_memory': True}
Environment vars changed: {'stream': False, 'inference_memory': 1787.0, 'pin_shared_memory': True}
Loading Model: {'checkpoint_info': {'filename': 'C:\\AI\\webui_forge_cu124_torch24\\webui\\models\\Stable-diffusion\\flux1-dev-fp8.safetensors', 'hash': 'be9881f4'}, 'additional_modules': [], 'unet_storage_dtype': None}
[Unload] Trying to free 953674316406250018963456.00 MB for cuda:0 with 0 models keep loaded ...
StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 198, 'text_encoder_2': 220, 'ignore': 0}
Using Detected T5 Data Type: torch.float8_e4m3fn
Using Detected UNet Type: torch.float8_e4m3fn
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': torch.float8_e4m3fn, 'computation_dtype': torch.bfloat16}
Model loaded in 3.8s (unload existing model: 0.2s, forge model load: 3.6s).
[LORA] Loaded C:\AI\webui_forge_cu124_torch24\webui\models\Lora\ssery.safetensors for KModel-UNet with 304 keys at weight 1.0 (skipped 0 keys)
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
To load target model JointTextEncoder
Begin to load 1 model
[Unload] Trying to free 8488.00 MB for cuda:0 with 0 models keep loaded ...
[Memory Management] Current Free GPU Memory: 11235.00 MB
[Memory Management] Required Model Memory: 5154.62 MB
[Memory Management] Required Inference Memory: 1787.00 MB
[Memory Management] Estimated Remaining GPU Memory: 4293.38 MB
Moving model(s) has taken 11.94 seconds
Distilled CFG Scale: 3.5
To load target model KModel
Begin to load 1 model
[Unload] Trying to free 16542.09 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 5956.42 MB ...
[Unload] Unload model JointTextEncoder
[Memory Management] Current Free GPU Memory: 11191.03 MB
[Memory Management] Required Model Memory: 11350.07 MB
[Memory Management] Required Inference Memory: 1787.00 MB
[Memory Management] Estimated Remaining GPU Memory: -1946.04 MB
Patching LoRAs for KModel: 100%|█████████████████████████████████████████████████████| 304/304 [03:10<00:00, 1.59it/s]
LoRA patching has taken 190.70 seconds
[Memory Management] Loaded to Shared Swap: 3231.77 MB (blocked method)
[Memory Management] Loaded to GPU: 8118.28 MB
Moving model(s) has taken 207.62 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [01:20<00:00, 4.01s/it]
To load target model IntegratedAutoencoderKL███████████████████████████████████████████| 20/20 [01:14<00:00, 3.93s/it]
Begin to load 1 model
[Unload] Trying to free 4495.77 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 2798.68 MB ...
[Unload] Unload model KModel
And then it crashes and giving no output
same issue, almost same specs as you too except I have 16gb ram. I cannot get loras working at all, it just blows up Forge
Also forgot to mention that this error occurs both with flux1-dev-bnb-nf4.safetensors and flux1-dev-fp8.safetensors (flux1-dev-bnb-nf4-v2.safetensors wasn't tested)
tried to use flux1-dev-bnb-nf4-v2.safetensors, different error
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f2.0.1v1.10.1-previous-323-g72ab92f8
Commit hash: 72ab92f83e5a9e193726313c6d88ab435a61fb59
Launching Web UI with arguments:
Total VRAM 12287 MB, total RAM 32692 MB
pytorch version: 2.4.0+cu124
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3060 : native
Hint: your device supports --cuda-malloc for potential speed improvements.
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
CUDA Using Stream: False
C:\AI\webui_forge_cu124_torch24\system\python\lib\site-packages\transformers\utils\hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
warnings.warn(
Using pytorch cross attention
Using pytorch attention for VAE
ControlNet preprocessor location: C:\AI\webui_forge_cu124_torch24\webui\models\ControlNetPreprocessor
2024-08-18 15:16:02,866 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'C:\\AI\\webui_forge_cu124_torch24\\webui\\models\\Stable-diffusion\\flux1-dev-bnb-nf4-v2.safetensors', 'hash': 'f0770152'}, 'additional_modules': [], 'unet_storage_dtype': None}
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
Startup time: 16.6s (prepare environment: 3.3s, import torch: 7.1s, initialize shared: 0.2s, other imports: 1.2s, load scripts: 1.6s, create ui: 2.0s, gradio launch: 1.2s).
Environment vars changed: {'stream': False, 'inference_memory': 1787.0, 'pin_shared_memory': True}
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
Environment vars changed: {'stream': False, 'inference_memory': 1787.0, 'pin_shared_memory': True}
Environment vars changed: {'stream': False, 'inference_memory': 1787.0, 'pin_shared_memory': True}
Environment vars changed: {'stream': False, 'inference_memory': 1782.0, 'pin_shared_memory': True}
Environment vars changed: {'stream': False, 'inference_memory': 0.0, 'pin_shared_memory': True}
Loading Model: {'checkpoint_info': {'filename': 'C:\\AI\\webui_forge_cu124_torch24\\webui\\models\\Stable-diffusion\\flux1-dev-bnb-nf4-v2.safetensors', 'hash': 'f0770152'}, 'additional_modules': [], 'unet_storage_dtype': None}
[Unload] Trying to free 953674316406250018963456.00 MB for cuda:0 with 0 models keep loaded ...
StateDict Keys: {'transformer': 1722, 'vae': 244, 'text_encoder': 198, 'text_encoder_2': 220, 'ignore': 0}
Using Detected T5 Data Type: torch.float8_e4m3fn
Using Detected UNet Type: nf4
Using pre-quant state dict!
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': 'nf4', 'computation_dtype': torch.bfloat16}
Model loaded in 2.8s (unload existing model: 0.2s, forge model load: 2.5s).
[LORA] Loaded C:\AI\webui_forge_cu124_torch24\webui\models\Lora\ssery.safetensors for KModel-UNet with 304 keys at weight 1.0 (skipped 0 keys)
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
To load target model JointTextEncoder
Begin to load 1 model
[Unload] Trying to free 6701.00 MB for cuda:0 with 0 models keep loaded ...
[Memory Management] Current Free GPU Memory: 11235.00 MB
[Memory Management] Required Model Memory: 5154.62 MB
[Memory Management] Required Inference Memory: 0.00 MB
[Memory Management] Estimated Remaining GPU Memory: 6080.38 MB
Moving model(s) has taken 12.01 seconds
Distilled CFG Scale: 3.5
To load target model KModel
Begin to load 1 model
[Unload] Trying to free 9411.13 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 5955.11 MB ...
[Unload] Unload model JointTextEncoder
[Memory Management] Current Free GPU Memory: 11189.72 MB
[Memory Management] Required Model Memory: 6246.84 MB
[Memory Management] Required Inference Memory: 0.00 MB
[Memory Management] Estimated Remaining GPU Memory: 4942.88 MB
Patching LoRAs for KModel: 79%|██████████████████████████████████████████ | 241/304 [00:23<00:13, 4.56it/s]ERROR lora diffusion_model.single_blocks.17.linear1.weight CUDA out of memory. Tried to allocate 252.00 MiB. GPU 0 has a total capacity of 12.00 GiB of which 0 bytes is free. Of the allocated memory 17.02 GiB is allocated by PyTorch, and 1.89 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Patching LoRA weights failed. Retrying by offloading models.
Lora worked properly until a recent update that removed the need to patch lora for each generation
I tied with V2 too and it patched the lora with no errors but i had to interrupt the generation because an image that usually takes less than 40 seconds without a lora to finish, with one it took around a minute to reach 10% :
[LORA] Loaded C:\WebUI\webui_forge_cu121_torch231\webui\models\Lora\araminta_k_flux_film_foto.safetensors for KModel-UNet with 494 keys at weight 1.0 (skipped 0 keys)
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
To load target model JointTextEncoder
Begin to load 1 model
Reuse 1 loaded models
[Unload] Trying to free 7819.24 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 9851.10 MB ...
[Memory Management] Current Free GPU Memory: 9851.10 MB
[Memory Management] Required Model Memory: 0.00 MB
[Memory Management] Required Inference Memory: 1024.00 MB
[Memory Management] Estimated Remaining GPU Memory: 8827.10 MB
Moving model(s) has taken 0.01 seconds
Distilled CFG Scale: 3
To load target model KModel
Begin to load 1 model
Reuse 1 loaded models
[Unload] Trying to free 9411.13 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 9846.12 MB ...
[Memory Management] Current Free GPU Memory: 9846.12 MB
[Memory Management] Required Model Memory: 0.00 MB
[Memory Management] Required Inference Memory: 1024.00 MB
[Memory Management] Estimated Remaining GPU Memory: 8822.12 MB
Patching LoRAs for KModel: 100%|█████████████████████████████████████████████████████| 304/304 [00:14<00:00, 21.02it/s]
LoRA patching has taken 15.43 seconds
Moving model(s) has taken 16.40 seconds
10%|████████▎ | 2/20 [01:13<10:57, 36.50s/it]
[Unload] Trying to free 4287.94 MB for cuda:0 with 1 models keep loaded ... | 2/20 [00:36<05:26, 18.14s/it]
[Unload] Current free memory is 25603.63 MB ...
Memory cleanup has taken 0.89 seconds
Total progress: 10%|██████▋ | 2/20 [01:13<10:57, 36.50s/it]
Total progress: 10%|██████▋ | 2/20 [01:13<05:26, 18.14s/it]
Okay so I had the same problem, not JUST with flux but with 1.5 models and loras as well. I got this when I tried to generate a 512x image with a lora that took around 10 seconds in Automatic:
Begin to load 1 model [Unload] Trying to free 32877.42 MB for cuda:0 with 0 models keep loaded ... [Unload] Current free memory is 9496.57 MB ... [Unload] Unload model KModel [Memory Management] Current Free GPU Memory: 11147.52 MB [Memory Management] Required Model Memory: 159.56 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: 9963.96 MB Moving model(s) has taken 0.39 seconds
The problem is apparent. It's trying to reserve 33 GB VRAM when it shows that it actually needs a little under 2. In OP's post, you'll see it's trying to reserve well over the total vram amount that exists for him. This really shouldn't be happening, and I think it's something wrong with forge itself, because this doesn't happen in automatic. I checked just to be sure.
Can confirm that this happens with a variety of models when a LoRA is used. It tries to reserve an unreasonable amount of memory. In my case, it goes for an absurd amount of 953674316406250018963456.00 MB. And on any subsequent model loads it is far more reasonable, proper amounts of memory.
EDIT: Forgot to mention, the reason why it even goes that far, is because I have system memory fallback disabled for Python, so in general Memory Management in Forge gets funky that way. But it is only way to get good speeds, we're talking twice as fast.
Can confirm that this happens with a variety of models when a LoRA is used. It tries to reserve an unreasonable amount of memory. In my case, it goes for an absurd amount of
953674316406250018963456.00 MB. And on any subsequent model loads it is far more reasonable, proper amounts of memory.EDIT: Forgot to mention, the reason why it even goes that far, is because I have system memory fallback disabled for Python, so in general Memory Management in Forge gets funky that way. But it is only way to get good speeds, we're talking twice as fast.
Okay, so I have a question for you, as I am not very knowledgeable about python and setting it up settings why.
are you saying that disabling that is what allows it to have that absurd amount, or that this is caused by having that disabled?
@ArmadstheDoom Well this is a weird side-effect that it has on Forge's memory management. By disabling system memory fallback in Nvidia's Control Panel for Python, this feature itself was quietly added in one of the Nvidia driver updates without a way to disable it until much later that also caused gen times to be much worse, it prevents it from being able to use RAM when you run out of VRAM. The feature itself was added to Nvidia drivers to prevent applications from crashing when running out of memory. But in my case it butchered gen times, and I think it had something to do with the way LoRAs are handled that causes them to attempt to use as much memory as they are able to.
I have never heard of this before. I may need to look into it. At the very least though, we know that it's not a real solution for the problem though, if we're both experiencing it regardless of the option being in use.
On Sun, Aug 18, 2024 at 7:44 PM Nyks @.***> wrote:
@ArmadstheDoom https://github.com/ArmadstheDoom Well this is a weird side-effect that it has on Forge's memory management. By disabling system memory fallback in Nvidia's Control Panel for Python, this feature itself was quietly added in one of the Nvidia driver updates without a way to disable it until much later that also caused gen times to be much worse, it prevents it from being able to use RAM when you run out of VRAM. The feature itself was added to Nvidia drivers to prevent applications from crashing when running out of memory. But in my case it butchered gen times, and I think it had something to do with the way LoRAs are handled that causes them to attempt to use as much memory as they are able to.
— Reply to this email directly, view it on GitHub https://github.com/lllyasviel/stable-diffusion-webui-forge/issues/1260#issuecomment-2295437658, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3XUKHNH7NRM6EA6HHF6UI3ZSEWVBAVCNFSM6AAAAABMWENYXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJVGQZTONRVHA . You are receiving this because you were mentioned.Message ID: @.*** com>
Too many problems with LORAs right now
edit: after today's update I have not noticed any changes from my previous message
I can confirm that the latest update did not fix the problems. All errors remained in the same form as before.
Well to add some things. The weirdly huge number of memory that the UI tried to use for me had nothing to do with sysmem fallback. It is just like that for some reason, trying to free obscene amounts of memory before loading all the models. It did also turn out that my personal issues were mostly due to RAM OOM'ing and not the VRAM, was easy to fix by ramping up allocated virtual memory, but that's still awfully large amount of memory it consumes with LoRAs. To me it always seemed like it doesn't calculate the memory needed for LoRAs correctly. That and on top of it the recent changes make the models unload with every gen.
I can confirm that the latest update did not fix the problems. All errors remained in the same form as before.
me too i'm still facing the "Connection timeout" popup on the browser and the patching loras for kmodel stops counting and freezes my whole pc, i tried both of flux1-dev-fp8 and flux1-dev-bnb-nf4, nothing changed :/
same problem here with 12gb of vram
Diffusion in low bits: automatic (fp16 lora) fixed problem
Diffusion in low bits: automatic (fp16 lora) fixed problem
yeah... now the patching is fixed, but it don't finish the generation. it arrive at 95% and it crash
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Version: f2.0.1v1.10.1-previous-391-g2b1e7366 Commit hash: 2b1e7366a7e9851d013d473e130478120f25e31e Launching Web UI with arguments: --xformers --skip-torch-cuda-test --no-half-vae --disable-safe-unpickle --ckpt-dir 'G:\CKPT' --vae-dir 'G:\VAE' --lora-dir 'G:\Lora' --esrgan-models-path 'G:\ESRGAN' --cuda-malloc Using cudaMallocAsync backend. Total VRAM 12288 MB, total RAM 32735 MB pytorch version: 2.3.1+cu121 xformers version: 0.0.27 Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA GeForce RTX 2060 : cudaMallocAsync VAE dtype preferences: [torch.float32] -> torch.float32 CUDA Using Stream: False H:\webui_forge_cu121_torch231\system\python\lib\site-packages\transformers\utils\hub.py:127: FutureWarning: Using TRANSFORMERS_CACHEis deprecated and will be removed in v5 of Transformers. UseHF_HOME` instead.
warnings.warn(
Using xformers cross attention
Using xformers attention for VAE
ControlNet preprocessor location: H:\webui_forge_cu121_torch231\webui\models\ControlNetPreprocessor
[-] ADetailer initialized. version: 24.8.0, num models: 10
2024-08-21 17:27:29,076 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'G:\CKPT\flux1-dev-fp8-full.safetensors', 'hash': 'be9881f4'}, 'additional_modules': [], 'unet_storage_dtype': torch.float8_e4m3fn}
Using online LoRAs in FP16: False
Running on local URL: http://127.0.0.1:7860
To create a public link, set share=True in launch().
Startup time: 23.4s (prepare environment: 1.4s, import torch: 10.1s, initialize shared: 0.2s, other imports: 0.6s, load scripts: 3.5s, create ui: 4.0s, gradio launch: 3.5s).
Model selected: {'checkpoint_info': {'filename': 'G:\CKPT\flux1-dev-fp8-full.safetensors', 'hash': 'be9881f4'}, 'additional_modules': [], 'unet_storage_dtype': torch.float8_e5m2}
Using online LoRAs in FP16: True
Model selected: {'checkpoint_info': {'filename': 'G:\CKPT\flux1-dev-fp8-full.safetensors', 'hash': 'be9881f4'}, 'additional_modules': [], 'unet_storage_dtype': torch.float8_e4m3fn}
Using online LoRAs in FP16: True
Loading Model: {'checkpoint_info': {'filename': 'G:\CKPT\flux1-dev-fp8-full.safetensors', 'hash': 'be9881f4'}, 'additional_modules': [], 'unet_storage_dtype': torch.float8_e4m3fn}
[Unload] Trying to free 953674316406250018963456.00 MB for cuda:0 with 0 models keep loaded ...
StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 198, 'text_encoder_2': 220, 'ignore': 0}
Using Detected T5 Data Type: torch.float8_e4m3fn
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': torch.float8_e4m3fn, 'computation_dtype': torch.float16}
Model loaded in 1.3s (unload existing model: 0.3s, forge model load: 1.0s).
[LORA] Loaded G:\Lora\AniVerse_flux_lora_01-AdamW-3e-4-RunPod-A6000Ada-bs3.safetensors for KModel-UNet with 494 keys at weight 1.0 (skipped 0 keys)
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
To load target model JointTextEncoder
Begin to load 1 model
[Unload] Trying to free 7725.00 MB for cuda:0 with 0 models keep loaded ...
[Memory Management] Current Free GPU Memory: 11195.00 MB
[Memory Management] Required Model Memory: 5154.62 MB
[Memory Management] Required Inference Memory: 1024.00 MB
[Memory Management] Estimated Remaining GPU Memory: 5016.38 MB
Moving model(s) has taken 3.28 seconds
Distilled CFG Scale: 3.5
To load target model KModel
Begin to load 1 model
[Unload] Trying to free 16700.83 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 5926.55 MB ...
[Unload] Unload model JointTextEncoder
[Memory Management] Current Free GPU Memory: 11153.66 MB
[Memory Management] Required Model Memory: 11350.07 MB
[Memory Management] Required Inference Memory: 1024.00 MB
[Memory Management] Estimated Remaining GPU Memory: -1220.41 MB
Patching LoRAs for KModel: 100%|██████████████████████████████████████████████████| 304/304 [00:00<00:00, 38041.30it/s]
[Memory Management] Loaded to CPU Swap: 2502.59 MB (blocked method)
[Memory Management] Loaded to GPU: 8847.46 MB
Moving model(s) has taken 8.82 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [01:44<00:00, 5.21s/it]
To load target model IntegratedAutoencoderKL███████████████████████████████████████████| 20/20 [01:29<00:00, 4.73s/it]
Begin to load 1 model
[Unload] Trying to free 8991.55 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 1630.61 MB ...
[Unload] Unload model KModel
Premere un tasto per continuare . . .`
I fixed it for me i just changed the Diffusion in low Bits to Automatic (LoRa in fp16) it skips patching loras aswell i hope this helps!