stable-diffusion-webui-forge icon indicating copy to clipboard operation
stable-diffusion-webui-forge copied to clipboard

Patching lora fixed but not the generation

Open SirVirgo opened this issue 1 year ago • 15 comments

When using lora, the crash happens at the very end of the generation. PC: 3060 12gb, 32RAM I use standard forge settings and swap location shared (switching to cpu did't help)

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f2.0.1v1.10.1-previous-323-g72ab92f8
Commit hash: 72ab92f83e5a9e193726313c6d88ab435a61fb59
Launching Web UI with arguments:
Total VRAM 12287 MB, total RAM 32692 MB
pytorch version: 2.4.0+cu124
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3060 : native
Hint: your device supports --cuda-malloc for potential speed improvements.
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
CUDA Using Stream: False
C:\AI\webui_forge_cu124_torch24\system\python\lib\site-packages\transformers\utils\hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
Using pytorch cross attention
Using pytorch attention for VAE
ControlNet preprocessor location: C:\AI\webui_forge_cu124_torch24\webui\models\ControlNetPreprocessor
2024-08-18 11:53:00,451 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'C:\\AI\\webui_forge_cu124_torch24\\webui\\models\\Stable-diffusion\\flux1-dev-fp8.safetensors', 'hash': 'be9881f4'}, 'additional_modules': [], 'unet_storage_dtype': None}
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 14.5s (prepare environment: 2.7s, import torch: 6.1s, initialize shared: 0.1s, other imports: 0.9s, load scripts: 1.4s, create ui: 2.0s, gradio launch: 1.2s).
Environment vars changed: {'stream': False, 'inference_memory': 1787.0, 'pin_shared_memory': True}
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
Environment vars changed: {'stream': False, 'inference_memory': 1787.0, 'pin_shared_memory': True}
Environment vars changed: {'stream': False, 'inference_memory': 1787.0, 'pin_shared_memory': True}
Loading Model: {'checkpoint_info': {'filename': 'C:\\AI\\webui_forge_cu124_torch24\\webui\\models\\Stable-diffusion\\flux1-dev-fp8.safetensors', 'hash': 'be9881f4'}, 'additional_modules': [], 'unet_storage_dtype': None}
[Unload] Trying to free 953674316406250018963456.00 MB for cuda:0 with 0 models keep loaded ...
StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 198, 'text_encoder_2': 220, 'ignore': 0}
Using Detected T5 Data Type: torch.float8_e4m3fn
Using Detected UNet Type: torch.float8_e4m3fn
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': torch.float8_e4m3fn, 'computation_dtype': torch.bfloat16}
Model loaded in 3.8s (unload existing model: 0.2s, forge model load: 3.6s).
[LORA] Loaded C:\AI\webui_forge_cu124_torch24\webui\models\Lora\ssery.safetensors for KModel-UNet with 304 keys at weight 1.0 (skipped 0 keys)
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
To load target model JointTextEncoder
Begin to load 1 model
[Unload] Trying to free 8488.00 MB for cuda:0 with 0 models keep loaded ...
[Memory Management] Current Free GPU Memory: 11235.00 MB
[Memory Management] Required Model Memory: 5154.62 MB
[Memory Management] Required Inference Memory: 1787.00 MB
[Memory Management] Estimated Remaining GPU Memory: 4293.38 MB
Moving model(s) has taken 11.94 seconds
Distilled CFG Scale: 3.5
To load target model KModel
Begin to load 1 model
[Unload] Trying to free 16542.09 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 5956.42 MB ...
[Unload] Unload model JointTextEncoder
[Memory Management] Current Free GPU Memory: 11191.03 MB
[Memory Management] Required Model Memory: 11350.07 MB
[Memory Management] Required Inference Memory: 1787.00 MB
[Memory Management] Estimated Remaining GPU Memory: -1946.04 MB
Patching LoRAs for KModel: 100%|█████████████████████████████████████████████████████| 304/304 [03:10<00:00,  1.59it/s]
LoRA patching has taken 190.70 seconds
[Memory Management] Loaded to Shared Swap: 3231.77 MB (blocked method)
[Memory Management] Loaded to GPU: 8118.28 MB
Moving model(s) has taken 207.62 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [01:20<00:00,  4.01s/it]
To load target model IntegratedAutoencoderKL███████████████████████████████████████████| 20/20 [01:14<00:00,  3.93s/it]
Begin to load 1 model
[Unload] Trying to free 4495.77 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 2798.68 MB ...
[Unload] Unload model KModel

And then it crashes and giving no output

SirVirgo avatar Aug 18 '24 08:08 SirVirgo

same issue, almost same specs as you too except I have 16gb ram. I cannot get loras working at all, it just blows up Forge

queenofinvidia avatar Aug 18 '24 09:08 queenofinvidia

Also forgot to mention that this error occurs both with flux1-dev-bnb-nf4.safetensors and flux1-dev-fp8.safetensors (flux1-dev-bnb-nf4-v2.safetensors wasn't tested)

SirVirgo avatar Aug 18 '24 09:08 SirVirgo

tried to use flux1-dev-bnb-nf4-v2.safetensors, different error

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f2.0.1v1.10.1-previous-323-g72ab92f8
Commit hash: 72ab92f83e5a9e193726313c6d88ab435a61fb59
Launching Web UI with arguments:
Total VRAM 12287 MB, total RAM 32692 MB
pytorch version: 2.4.0+cu124
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3060 : native
Hint: your device supports --cuda-malloc for potential speed improvements.
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
CUDA Using Stream: False
C:\AI\webui_forge_cu124_torch24\system\python\lib\site-packages\transformers\utils\hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
Using pytorch cross attention
Using pytorch attention for VAE
ControlNet preprocessor location: C:\AI\webui_forge_cu124_torch24\webui\models\ControlNetPreprocessor
2024-08-18 15:16:02,866 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'C:\\AI\\webui_forge_cu124_torch24\\webui\\models\\Stable-diffusion\\flux1-dev-bnb-nf4-v2.safetensors', 'hash': 'f0770152'}, 'additional_modules': [], 'unet_storage_dtype': None}
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 16.6s (prepare environment: 3.3s, import torch: 7.1s, initialize shared: 0.2s, other imports: 1.2s, load scripts: 1.6s, create ui: 2.0s, gradio launch: 1.2s).
Environment vars changed: {'stream': False, 'inference_memory': 1787.0, 'pin_shared_memory': True}
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
Environment vars changed: {'stream': False, 'inference_memory': 1787.0, 'pin_shared_memory': True}
Environment vars changed: {'stream': False, 'inference_memory': 1787.0, 'pin_shared_memory': True}
Environment vars changed: {'stream': False, 'inference_memory': 1782.0, 'pin_shared_memory': True}
Environment vars changed: {'stream': False, 'inference_memory': 0.0, 'pin_shared_memory': True}
Loading Model: {'checkpoint_info': {'filename': 'C:\\AI\\webui_forge_cu124_torch24\\webui\\models\\Stable-diffusion\\flux1-dev-bnb-nf4-v2.safetensors', 'hash': 'f0770152'}, 'additional_modules': [], 'unet_storage_dtype': None}
[Unload] Trying to free 953674316406250018963456.00 MB for cuda:0 with 0 models keep loaded ...
StateDict Keys: {'transformer': 1722, 'vae': 244, 'text_encoder': 198, 'text_encoder_2': 220, 'ignore': 0}
Using Detected T5 Data Type: torch.float8_e4m3fn
Using Detected UNet Type: nf4
Using pre-quant state dict!
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': 'nf4', 'computation_dtype': torch.bfloat16}
Model loaded in 2.8s (unload existing model: 0.2s, forge model load: 2.5s).
[LORA] Loaded C:\AI\webui_forge_cu124_torch24\webui\models\Lora\ssery.safetensors for KModel-UNet with 304 keys at weight 1.0 (skipped 0 keys)
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
To load target model JointTextEncoder
Begin to load 1 model
[Unload] Trying to free 6701.00 MB for cuda:0 with 0 models keep loaded ...
[Memory Management] Current Free GPU Memory: 11235.00 MB
[Memory Management] Required Model Memory: 5154.62 MB
[Memory Management] Required Inference Memory: 0.00 MB
[Memory Management] Estimated Remaining GPU Memory: 6080.38 MB
Moving model(s) has taken 12.01 seconds
Distilled CFG Scale: 3.5
To load target model KModel
Begin to load 1 model
[Unload] Trying to free 9411.13 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 5955.11 MB ...
[Unload] Unload model JointTextEncoder
[Memory Management] Current Free GPU Memory: 11189.72 MB
[Memory Management] Required Model Memory: 6246.84 MB
[Memory Management] Required Inference Memory: 0.00 MB
[Memory Management] Estimated Remaining GPU Memory: 4942.88 MB
Patching LoRAs for KModel:  79%|██████████████████████████████████████████           | 241/304 [00:23<00:13,  4.56it/s]ERROR lora diffusion_model.single_blocks.17.linear1.weight CUDA out of memory. Tried to allocate 252.00 MiB. GPU 0 has a total capacity of 12.00 GiB of which 0 bytes is free. Of the allocated memory 17.02 GiB is allocated by PyTorch, and 1.89 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Patching LoRA weights failed. Retrying by offloading models.

Lora worked properly until a recent update that removed the need to patch lora for each generation

SirVirgo avatar Aug 18 '24 11:08 SirVirgo

I tied with V2 too and it patched the lora with no errors but i had to interrupt the generation because an image that usually takes less than 40 seconds without a lora to finish, with one it took around a minute to reach 10% :

[LORA] Loaded C:\WebUI\webui_forge_cu121_torch231\webui\models\Lora\araminta_k_flux_film_foto.safetensors for KModel-UNet with 494 keys at weight 1.0 (skipped 0 keys)
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.
To load target model JointTextEncoder
Begin to load 1 model
Reuse 1 loaded models
[Unload] Trying to free 7819.24 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 9851.10 MB ...
[Memory Management] Current Free GPU Memory: 9851.10 MB
[Memory Management] Required Model Memory: 0.00 MB
[Memory Management] Required Inference Memory: 1024.00 MB
[Memory Management] Estimated Remaining GPU Memory: 8827.10 MB
Moving model(s) has taken 0.01 seconds
Distilled CFG Scale: 3
To load target model KModel
Begin to load 1 model
Reuse 1 loaded models
[Unload] Trying to free 9411.13 MB for cuda:0 with 0 models keep loaded ...
[Unload] Current free memory is 9846.12 MB ...
[Memory Management] Current Free GPU Memory: 9846.12 MB
[Memory Management] Required Model Memory: 0.00 MB
[Memory Management] Required Inference Memory: 1024.00 MB
[Memory Management] Estimated Remaining GPU Memory: 8822.12 MB
Patching LoRAs for KModel: 100%|█████████████████████████████████████████████████████| 304/304 [00:14<00:00, 21.02it/s]
LoRA patching has taken 15.43 seconds
Moving model(s) has taken 16.40 seconds
 10%|████████▎                                                                          | 2/20 [01:13<10:57, 36.50s/it]
[Unload] Trying to free 4287.94 MB for cuda:0 with 1 models keep loaded ...             | 2/20 [00:36<05:26, 18.14s/it]
[Unload] Current free memory is 25603.63 MB ...
Memory cleanup has taken 0.89 seconds
Total progress:  10%|██████▋                                                            | 2/20 [01:13<10:57, 36.50s/it]
Total progress:  10%|██████▋                                                            | 2/20 [01:13<05:26, 18.14s/it]

Dravoss avatar Aug 18 '24 17:08 Dravoss

Okay so I had the same problem, not JUST with flux but with 1.5 models and loras as well. I got this when I tried to generate a 512x image with a lora that took around 10 seconds in Automatic:

Begin to load 1 model [Unload] Trying to free 32877.42 MB for cuda:0 with 0 models keep loaded ... [Unload] Current free memory is 9496.57 MB ... [Unload] Unload model KModel [Memory Management] Current Free GPU Memory: 11147.52 MB [Memory Management] Required Model Memory: 159.56 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: 9963.96 MB Moving model(s) has taken 0.39 seconds

The problem is apparent. It's trying to reserve 33 GB VRAM when it shows that it actually needs a little under 2. In OP's post, you'll see it's trying to reserve well over the total vram amount that exists for him. This really shouldn't be happening, and I think it's something wrong with forge itself, because this doesn't happen in automatic. I checked just to be sure.

ArmadstheDoom avatar Aug 18 '24 22:08 ArmadstheDoom

Can confirm that this happens with a variety of models when a LoRA is used. It tries to reserve an unreasonable amount of memory. In my case, it goes for an absurd amount of 953674316406250018963456.00 MB. And on any subsequent model loads it is far more reasonable, proper amounts of memory.

EDIT: Forgot to mention, the reason why it even goes that far, is because I have system memory fallback disabled for Python, so in general Memory Management in Forge gets funky that way. But it is only way to get good speeds, we're talking twice as fast.

UmbralMoth avatar Aug 18 '24 23:08 UmbralMoth

Can confirm that this happens with a variety of models when a LoRA is used. It tries to reserve an unreasonable amount of memory. In my case, it goes for an absurd amount of 953674316406250018963456.00 MB. And on any subsequent model loads it is far more reasonable, proper amounts of memory.

EDIT: Forgot to mention, the reason why it even goes that far, is because I have system memory fallback disabled for Python, so in general Memory Management in Forge gets funky that way. But it is only way to get good speeds, we're talking twice as fast.

Okay, so I have a question for you, as I am not very knowledgeable about python and setting it up settings why.

are you saying that disabling that is what allows it to have that absurd amount, or that this is caused by having that disabled?

ArmadstheDoom avatar Aug 18 '24 23:08 ArmadstheDoom

@ArmadstheDoom Well this is a weird side-effect that it has on Forge's memory management. By disabling system memory fallback in Nvidia's Control Panel for Python, this feature itself was quietly added in one of the Nvidia driver updates without a way to disable it until much later that also caused gen times to be much worse, it prevents it from being able to use RAM when you run out of VRAM. The feature itself was added to Nvidia drivers to prevent applications from crashing when running out of memory. But in my case it butchered gen times, and I think it had something to do with the way LoRAs are handled that causes them to attempt to use as much memory as they are able to.

UmbralMoth avatar Aug 18 '24 23:08 UmbralMoth

I have never heard of this before. I may need to look into it. At the very least though, we know that it's not a real solution for the problem though, if we're both experiencing it regardless of the option being in use.

On Sun, Aug 18, 2024 at 7:44 PM Nyks @.***> wrote:

@ArmadstheDoom https://github.com/ArmadstheDoom Well this is a weird side-effect that it has on Forge's memory management. By disabling system memory fallback in Nvidia's Control Panel for Python, this feature itself was quietly added in one of the Nvidia driver updates without a way to disable it until much later that also caused gen times to be much worse, it prevents it from being able to use RAM when you run out of VRAM. The feature itself was added to Nvidia drivers to prevent applications from crashing when running out of memory. But in my case it butchered gen times, and I think it had something to do with the way LoRAs are handled that causes them to attempt to use as much memory as they are able to.

— Reply to this email directly, view it on GitHub https://github.com/lllyasviel/stable-diffusion-webui-forge/issues/1260#issuecomment-2295437658, or unsubscribe https://github.com/notifications/unsubscribe-auth/A3XUKHNH7NRM6EA6HHF6UI3ZSEWVBAVCNFSM6AAAAABMWENYXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJVGQZTONRVHA . You are receiving this because you were mentioned.Message ID: @.*** com>

ArmadstheDoom avatar Aug 19 '24 00:08 ArmadstheDoom

Too many problems with LORAs right now

elen07zz avatar Aug 19 '24 05:08 elen07zz

edit: after today's update I have not noticed any changes from my previous message

Dravoss avatar Aug 19 '24 12:08 Dravoss

I can confirm that the latest update did not fix the problems. All errors remained in the same form as before.

SirVirgo avatar Aug 19 '24 14:08 SirVirgo

Well to add some things. The weirdly huge number of memory that the UI tried to use for me had nothing to do with sysmem fallback. It is just like that for some reason, trying to free obscene amounts of memory before loading all the models. It did also turn out that my personal issues were mostly due to RAM OOM'ing and not the VRAM, was easy to fix by ramping up allocated virtual memory, but that's still awfully large amount of memory it consumes with LoRAs. To me it always seemed like it doesn't calculate the memory needed for LoRAs correctly. That and on top of it the recent changes make the models unload with every gen.

UmbralMoth avatar Aug 19 '24 14:08 UmbralMoth

I can confirm that the latest update did not fix the problems. All errors remained in the same form as before.

me too i'm still facing the "Connection timeout" popup on the browser and the patching loras for kmodel stops counting and freezes my whole pc, i tried both of flux1-dev-fp8 and flux1-dev-bnb-nf4, nothing changed :/

moudahaddad14 avatar Aug 19 '24 15:08 moudahaddad14

same problem here with 12gb of vram

Samael-1976 avatar Aug 19 '24 19:08 Samael-1976

Diffusion in low bits: automatic (fp16 lora) fixed problem

SirVirgo avatar Aug 21 '24 14:08 SirVirgo

Diffusion in low bits: automatic (fp16 lora) fixed problem

yeah... now the patching is fixed, but it don't finish the generation. it arrive at 95% and it crash

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Version: f2.0.1v1.10.1-previous-391-g2b1e7366 Commit hash: 2b1e7366a7e9851d013d473e130478120f25e31e Launching Web UI with arguments: --xformers --skip-torch-cuda-test --no-half-vae --disable-safe-unpickle --ckpt-dir 'G:\CKPT' --vae-dir 'G:\VAE' --lora-dir 'G:\Lora' --esrgan-models-path 'G:\ESRGAN' --cuda-malloc Using cudaMallocAsync backend. Total VRAM 12288 MB, total RAM 32735 MB pytorch version: 2.3.1+cu121 xformers version: 0.0.27 Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA GeForce RTX 2060 : cudaMallocAsync VAE dtype preferences: [torch.float32] -> torch.float32 CUDA Using Stream: False H:\webui_forge_cu121_torch231\system\python\lib\site-packages\transformers\utils\hub.py:127: FutureWarning: Using TRANSFORMERS_CACHEis deprecated and will be removed in v5 of Transformers. UseHF_HOME` instead. warnings.warn( Using xformers cross attention Using xformers attention for VAE ControlNet preprocessor location: H:\webui_forge_cu121_torch231\webui\models\ControlNetPreprocessor [-] ADetailer initialized. version: 24.8.0, num models: 10 2024-08-21 17:27:29,076 - ControlNet - INFO - ControlNet UI callback registered. Model selected: {'checkpoint_info': {'filename': 'G:\CKPT\flux1-dev-fp8-full.safetensors', 'hash': 'be9881f4'}, 'additional_modules': [], 'unet_storage_dtype': torch.float8_e4m3fn} Using online LoRAs in FP16: False Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Startup time: 23.4s (prepare environment: 1.4s, import torch: 10.1s, initialize shared: 0.2s, other imports: 0.6s, load scripts: 3.5s, create ui: 4.0s, gradio launch: 3.5s). Model selected: {'checkpoint_info': {'filename': 'G:\CKPT\flux1-dev-fp8-full.safetensors', 'hash': 'be9881f4'}, 'additional_modules': [], 'unet_storage_dtype': torch.float8_e5m2} Using online LoRAs in FP16: True Model selected: {'checkpoint_info': {'filename': 'G:\CKPT\flux1-dev-fp8-full.safetensors', 'hash': 'be9881f4'}, 'additional_modules': [], 'unet_storage_dtype': torch.float8_e4m3fn} Using online LoRAs in FP16: True Loading Model: {'checkpoint_info': {'filename': 'G:\CKPT\flux1-dev-fp8-full.safetensors', 'hash': 'be9881f4'}, 'additional_modules': [], 'unet_storage_dtype': torch.float8_e4m3fn} [Unload] Trying to free 953674316406250018963456.00 MB for cuda:0 with 0 models keep loaded ... StateDict Keys: {'transformer': 780, 'vae': 244, 'text_encoder': 198, 'text_encoder_2': 220, 'ignore': 0} Using Detected T5 Data Type: torch.float8_e4m3fn Working with z of shape (1, 16, 32, 32) = 16384 dimensions. K-Model Created: {'storage_dtype': torch.float8_e4m3fn, 'computation_dtype': torch.float16} Model loaded in 1.3s (unload existing model: 0.3s, forge model load: 1.0s). [LORA] Loaded G:\Lora\AniVerse_flux_lora_01-AdamW-3e-4-RunPod-A6000Ada-bs3.safetensors for KModel-UNet with 494 keys at weight 1.0 (skipped 0 keys) Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored. To load target model JointTextEncoder Begin to load 1 model [Unload] Trying to free 7725.00 MB for cuda:0 with 0 models keep loaded ... [Memory Management] Current Free GPU Memory: 11195.00 MB [Memory Management] Required Model Memory: 5154.62 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: 5016.38 MB Moving model(s) has taken 3.28 seconds Distilled CFG Scale: 3.5 To load target model KModel Begin to load 1 model [Unload] Trying to free 16700.83 MB for cuda:0 with 0 models keep loaded ... [Unload] Current free memory is 5926.55 MB ... [Unload] Unload model JointTextEncoder [Memory Management] Current Free GPU Memory: 11153.66 MB [Memory Management] Required Model Memory: 11350.07 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: -1220.41 MB Patching LoRAs for KModel: 100%|██████████████████████████████████████████████████| 304/304 [00:00<00:00, 38041.30it/s] [Memory Management] Loaded to CPU Swap: 2502.59 MB (blocked method) [Memory Management] Loaded to GPU: 8847.46 MB Moving model(s) has taken 8.82 seconds 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [01:44<00:00, 5.21s/it] To load target model IntegratedAutoencoderKL███████████████████████████████████████████| 20/20 [01:29<00:00, 4.73s/it] Begin to load 1 model [Unload] Trying to free 8991.55 MB for cuda:0 with 0 models keep loaded ... [Unload] Current free memory is 1630.61 MB ... [Unload] Unload model KModel Premere un tasto per continuare . . .`

Samael-1976 avatar Aug 21 '24 15:08 Samael-1976

I fixed it for me i just changed the Diffusion in low Bits to Automatic (LoRa in fp16) it skips patching loras aswell i hope this helps!

DisabledE avatar Aug 23 '24 14:08 DisabledE