stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

[Bug]: crash when using sdxl loras

Open TongfanWeitf opened this issue 11 months ago • 49 comments

Checklist

  • [X] The issue exists after disabling all extensions
  • [X] The issue exists on a clean installation of webui
  • [ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
  • [X] The issue exists in the current version of the webui
  • [X] The issue has not been reported before recently
  • [ ] The issue has been reported before but has not been fixed yet

What happened?

if i use sdxl loras, webui will crash.

Steps to reproduce the problem

1.run webui. 2.run a txt2img with sdxl model and lora 3.crash

What should have happened?

successfully return the img

What browsers do you use to access the UI ?

Microsoft Edge

Sysinfo

sysinfo-2024-03-07-18-39.json

Console logs

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.8.0
Commit hash: bef51aed032c0aaa5cfd80445bc4cf0d85b408b5
Launching Web UI with arguments: --xformers --no-half-vae --no-half --medvram-sdxl
Loading weights [67ab2fd8ec] from D:\ai\webui\models\Stable-diffusion\ponyDiffusionV6XL_v6StartWithThisOne.safetensors
Creating model from config: D:\ai\webui\repositories\generative-models\configs\inference\sd_xl_base.yaml
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 16.4s (prepare environment: 3.4s, import torch: 5.9s, import gradio: 0.6s, setup paths: 0.8s, initialize shared: 3.2s, other imports: 0.6s, load scripts: 0.8s, create ui: 0.5s, gradio launch: 0.6s).
Loading VAE weights specified in settings: D:\ai\webui\models\VAE\sdxl_vae.safetensors
Applying attention optimization: xformers... done.
Model loaded in 20.6s (load weights from disk: 0.7s, create model: 1.9s, apply weights to model: 7.4s, apply float(): 4.6s, load VAE: 0.7s, calculate empty prompt: 5.3s).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:25<00:00,  1.28s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:27<00:00,  1.40s/it]
  0%|                                                                                           | 0/20 [00:00<?, ?it/s] 请按任意键继续. . .
//the first bar is without lora and the second one is with lora. it crashed so no error messages. the chinese at the end means "press any key to continue..."

Additional information

it is weird, beacuse I can run sdxl with loras before. In some day I suddently cant load sdxl models (pytorch allocated 10.6G which is much more than before), so I add --medvram-sdxl. Now I can load sdxl models, but I still cant use loras.

TongfanWeitf avatar Mar 07 '24 18:03 TongfanWeitf

i uploaded my gpu drive, now it reaches the end of progress: Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Version: v1.8.0 Commit hash: bef51aed032c0aaa5cfd80445bc4cf0d85b408b5 Launching Web UI with arguments: --xformers --no-half-vae --no-half --medvram-sdxl Loading weights [67ab2fd8ec] from D:\ai\webui\models\Stable-diffusion\ponyDiffusionV6XL_v6StartWithThisOne.safetensors Creating model from config: D:\ai\webui\repositories\generative-models\configs\inference\sd_xl_base.yaml Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Startup time: 13.2s (prepare environment: 3.1s, import torch: 6.0s, import gradio: 0.6s, setup paths: 0.8s, initialize shared: 0.2s, other imports: 0.5s, load scripts: 0.8s, create ui: 0.5s, gradio launch: 0.5s). Applying attention optimization: xformers... done. Model loaded in 16.6s (load weights from disk: 0.7s, create model: 2.2s, apply weights to model: 7.4s, apply float(): 4.9s, calculate empty prompt: 1.3s). 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:51<00:00, 2.57s/it] Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:33<00:00, 1.75s/it]

but will still crash after that(means I cant receive the img, though i can see it is gerenated, but it crashed the moment it finished, the eta in webui is like 90%/3s left)

TongfanWeitf avatar Mar 07 '24 19:03 TongfanWeitf

Can you check in windows event logs for any related messages? Python crash details? Resource exhaustion?

nailz420 avatar Mar 07 '24 20:03 nailz420

Can you check in windows event logs for any related messages? Python crash details? Resource exhaustion?

i already solved it. when I want to load sdxl models, I use --medvram-sdxl, then I restart without --medvram-sdxl and then I can used sdxl models. If I want to load another model, I restart with --medvram-sdxl again and do such thing again.

TongfanWeitf avatar Mar 07 '24 20:03 TongfanWeitf

Can you check in windows event logs for any related messages? Python crash details? Resource exhaustion?

i already solved it. when I want to load sdxl models, I use --medvram-sdxl, then I restart without --medvram-sdxl and then I can used sdxl models. If I want to load another model, I restart with --medvram-sdxl again and do such thing again.

That doesn't sound like a solution, but a clunky workaround. The issue still exists if you have to use that

nailz420 avatar Mar 08 '24 04:03 nailz420

I have the same issue, but the solution of using --medvram-sdxl then restarting does not solve it for me.

The sdxl model works fine on its own, but as soon as I add the lora it crashes. Typically as soon as I try to generate, my computer temporarily locks up, then the ui crashes with this in the console:

0%| | 0/8 [00:00<?, ?it/s]./webui.sh: line 292: 11929 Killed "${python_cmd}" -u "${LAUNCH_SCRIPT}" "$@"

Running Linux Mint. Using --no-half --precision full

Edit: I have discovered that when generating with a lora, it consumes all available RAM and then crashes. I suspect it is loading the entire base model into memory again, creating a copy and doubling memory usage. When only the base model is loaded, RAM usage is less than 20gb of 32gb. When I start the generation with the lora, roughly 3gb is added every seconds until it hits the full 32gb, which is when it either crashes or locks the PC.

I confirmed this by using SD 1.5, which is small enough that I can have 2 full copies in memory. Generating with a lora again starts by consuming a bunch of RAM, but stops at about 6gb of additional memory (the lora itself is only 150mb), then everything works and the image generates just fine.

I'm guessing that isn't supposed to happen? Shouldn't it either use the model already in memory, or free up that space if it is going to reload the whole thing?

aurelion314 avatar Mar 27 '24 00:03 aurelion314

I have the same issue, but the solution of using --medvram-sdxl then restarting does not solve it for me.

The sdxl model works fine on its own, but as soon as I add the lora it crashes. Typically as soon as I try to generate, my computer temporarily locks up, then the ui crashes with this in the console:

0%| | 0/8 [00:00<?, ?it/s]./webui.sh: line 292: 11929 Killed "${python_cmd}" -u "${LAUNCH_SCRIPT}" "$@"

Running Linux Mint. Using --no-half --precision full

Edit: I have discovered that when generating with a lora, it consumes all available RAM and then crashes. I suspect it is loading the entire base model into memory again, creating a copy and doubling memory usage. When only the base model is loaded, RAM usage is less than 20gb of 32gb. When I start the generation with the lora, roughly 3gb is added every seconds until it hits the full 32gb, which is when it either crashes or locks the PC.

I confirmed this by using SD 1.5, which is small enough that I can have 2 full copies in memory. Generating with a lora again starts by consuming a bunch of RAM, but stops at about 6gb of additional memory (the lora itself is only 150mb), then everything works and the image generates just fine.

I'm guessing that isn't supposed to happen? Shouldn't it either use the model already in memory, or free up that space if it is going to reload the whole thing?

I can confirm that when using the SDXL model (in my case PonyDiffusion), the RAM consumption increases with every second of generation until it reaches 15.6 GB (I have 16 GB) and the swap file starts to be used. That said, if I use Lora, the RAM gets clogged up after the first generation and it says "Press any button...". I perfectly remember using SDXL model 2 months ago without any problems, I don't remember such RAM consumption. And the error is not on the video memory side of the GPU. Has anyone found a solution?

aearone avatar Mar 28 '24 16:03 aearone

I figured out what the problem is. The issue is a RAM leak in the latest version of webui, i.e. during generation the RAM usage grows every second until it reaches the swap file limit. In webui version 1.7.0 there was no such thing. It's just that RAM is at its maximum, it is enough for the first/second generation. But when using Lora, it runs out almost instantly. Apparently there is a bug somewhere that the model is unloaded into RAM at every generation. After downgrading the webui version, the memory leaks no longer occur and the images are generated normally:) As I understand it, this is because of PyTorch 2.1.2, which was added to webui 1.8.0 version.

Let's wait for fixes, if there are any in the future.

You can download version 1.7.0 manually here:: https://github.com/AUTOMATIC1111/stable-diffusion-webui/tree/v1.7.0

@https://github.com/lllyasviel/stable-diffusion-webui-forge/issues/500#issue-2171364987 @https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/15206#issue-2177705949

aearone avatar Mar 28 '24 17:03 aearone

https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/15348

Les-Tin avatar Mar 29 '24 11:03 Les-Tin

I figured out what the problem is. The issue is a RAM leak in the latest version of webui, i.e. during generation the RAM usage grows every second until it reaches the swap file limit. In webui version 1.7.0 there was no such thing. It's just that RAM is at its maximum, it is enough for the first/second generation. But when using Lora, it runs out almost instantly. Apparently there is a bug somewhere that the model is unloaded into RAM at every generation. After downgrading the webui version, the memory leaks no longer occur and the images are generated normally:) As I understand it, this is because of PyTorch 2.1.2, which was added to webui 1.8.0 version.

Let's wait for fixes, if there are any in the future.

You can download version 1.7.0 manually here:: https://github.com/AUTOMATIC1111/stable-diffusion-webui/tree/v1.7.0

@lllyasviel/stable-diffusion-webui-forge#500 (comment) @#15206 (comment)

I don't think torch 2.1.2 is the only cause of the issue as I am having this same issue with my ARC gpu which uses torch: 2.0.0a0+gite9ebda2 in version 1.8 of the webui.

Stellaris12 avatar Apr 02 '24 16:04 Stellaris12

Can you check in windows event logs for any related messages? Python crash details? Resource exhaustion?

i already solved it. when I want to load sdxl models, I use --medvram-sdxl, then I restart without --medvram-sdxl and then I can used sdxl models. If I want to load another model, I restart with --medvram-sdxl again and do such thing again.

That doesn't sound like a solution, but a clunky workaround. The issue still exists if you have to use that

sry I cant give any debug logs. I am using google cloud and thus I can get rid of this bug. Just by restart the vm.

TongfanWeitf avatar Apr 02 '24 18:04 TongfanWeitf

I had similar problem with RAM being filled after each generation when I tried to compare models with x/y/z plot. With v1.8 increasing the number of model loaded into RAM crashing the system after few generations. Memory is allocated after each generation even when model is used from cache. For me using the old settings (obsolate one) for increasing the number of cached models instead the new one fixed the crashes.

settings

bandifiu avatar Apr 04 '24 01:04 bandifiu

This might be similar to my issues so I will post here instead of making new issue.

Console Log

Events look like this

Windows successfully diagnosed a low virtual memory condition. The following programs consumed the most virtual memory: python.exe (52724) consumed 21200195584 bytes, msedge.exe (46448) consumed 6455615488 bytes, and python.exe (10724) consumed 6229950464 bytes.

Bytes converted. It shouold not consume that much right? hc4i87vPW1

Event Viewer 1

Event Viewer 2

silveroxides avatar Apr 10 '24 16:04 silveroxides

@w-e-w You might want to look at this Update: I can somewhat see what might be happening now. Several times now when decoding image with VAE it leaves shared memory occupied while it drops some weights from the loaded checkpoint it seems which it then has to load back up when generation starts.

  1. End of normal run with decode start
  2. Decode where shared memory is used
  3. Decode finishes
  4. Normal loaded model weights
  5. Changing model

l5LBj9R7YK

  1. End of normal run with decode start
  2. Decode where shared memory is used
  3. Decode finishes and something is left in shared while VRAM is below normal
  4. Initiating next generation
  5. Actually starting to generate image

LzlhJyOkto

  1. Normal generation ongoing
  2. Decode start
  3. Decore using shared memory
  4. Same as previous with odd memory usage
  5. Sketchy workaround with opening settings and unloading model to RAM and then back to VRAM to get it back to normal

K01KjQmr5t

silveroxides avatar Apr 11 '24 23:04 silveroxides

I notice that the 1.9 release candidate made some changes regarding loras, has anyone tested to see if it's fixed on there?

God-damnit-all avatar Apr 12 '24 16:04 God-damnit-all

I tested SDXL loras with 1.9 and RAM usage still skyrockets to the maximum (32 GB) and then it starts using virtual memory (using up to 20 GB of virtual memory). Only happens when using SDXL Loras.

Takezo1000 avatar Apr 14 '24 02:04 Takezo1000

I tested SDXL loras with 1.9 and RAM usage still skyrockets to the maximum (32 GB) and then it starts using virtual memory (using up to 20 GB of virtual memory). Only happens when using SDXL Loras.

Is this with the now-released 1.9?

God-damnit-all avatar Apr 14 '24 03:04 God-damnit-all

I tested SDXL loras with 1.9 and RAM usage still skyrockets to the maximum (32 GB) and then it starts using virtual memory (using up to 20 GB of virtual memory). Only happens when using SDXL Loras.

Is this with the now-released 1.9?

Yes, with 1.9 it still has this problem. I'm using --medvram --medvram-sdxl arguments with RX 6700 XT (AMD)

Takezo1000 avatar Apr 15 '24 16:04 Takezo1000

I tested SDXL loras with 1.9 and RAM usage still skyrockets to the maximum (32 GB) and then it starts using virtual memory (using up to 20 GB of virtual memory). Only happens when using SDXL Loras.

Is this with the now-released 1.9?

Yes, with 1.9 it still has this problem. I'm using --medvram --medvram-sdxl arguments with RX 6700 XT (AMD)

That's unfortunate. I'd downgrade to 1.7 but that has problems of its own. Could it be related to xformers? Try setting Cross attention optimization to None in Optimizations.

God-damnit-all avatar Apr 16 '24 00:04 God-damnit-all

I tested SDXL loras with 1.9 and RAM usage still skyrockets to the maximum (32 GB) and then it starts using virtual memory (using up to 20 GB of virtual memory). Only happens when using SDXL Loras.

Is this with the now-released 1.9?

Yes, with 1.9 it still has this problem. I'm using --medvram --medvram-sdxl arguments with RX 6700 XT (AMD)

That's unfortunate. I'd downgrade to 1.7 but that has problems of its own. Could it be related to xformers? Try setting Cross attention optimization to None in Optimizations.

The following is what I did and it worked wonders. I get the feeling that the last two releases implemented a lot of arbritrary changes which had no justifiable reason to be implemented to a main branch. This include spandrel which has cause major slowdowns for a big part of users(yes you might not see them report here cause non tech savvy users get put off being requested to spend an hour learning how to properly submit an issue). I suspect they missed reading the full changelog for Torch releases 2.1.0-2.1.2 in regards to weights and multiple devices (in this case their odd loading between cpu and gpu) Anyway I hope this helps

Instructions:

Downgrade to 1.8.0

Install CUDA toolkit 11.8

Install cudnn

"extract respective files into CUDA installs bin,include and lib\x64 folders"

Click windows search bar

and type environment then click "Edit system environment variables" in results and in next window that pops up.

Editing variables

In the lower part check that you have CUDA_PATH set to same as CUDA_PATH_V11_8 and add one named CUDNN and give it the variable(assuming you install instandard location. edit the following if different)

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\lib\x64;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include;

After all this is done

Reboot your PC once.

Open your webui root folder

Open Powershell and acitvate the venv

.\venv\Scrips\activate

Then install torch 2.0.1 for cuda11.8

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

Install xformers 0.0.22

pip install xformers==0.0.22

Lastly do check just in case there is something important that conflicts but there rarely is.

pip check

Open webui-user.bat in text editor and add --skip-install just in case.

Hope this works. Will try to check back later to see if all went well

silveroxides avatar Apr 24 '24 04:04 silveroxides

Hi, happened to me this weekend as well,

Tried rolling back the 1.9.3 back to 1.9 then 1.8 as well as deleting the venv before coming here but it seems it didnt work. Anytime i try to gen on 1.5 the ram goes up then returns to about a fourth of the total available. With SDXL it seems there is a memory leak and it climbs with each gen, sometimes lowering a little but eventually filling the total.

Tried on 16 and 32gb of ram and i get the same effect on both. Adding more loras make the server crash earlier. I tried switching to 1.5 models after each XL gens but it seems to be mostly placebo. tried switching off xformers but no results either.

It pretty much always ends with an error message or with just "press any key to continue..." cmd_EjESY1zMy4 firefox_JPd8HggDnf

JeffreyLebino avatar Apr 28 '24 18:04 JeffreyLebino

Hi, happened to me this weekend as well,

Tried rolling back the 1.9.3 back to 1.9 then 1.8 as well as deleting the venv before coming here but it seems it didnt work. Anytime i try to gen on 1.5 the ram goes up then returns to about a fourth of the total available. With SDXL it seems there is a memory leak and it climbs with each gen, sometimes lowering a little but eventually filling the total.

Tried on 16 and 32gb of ram and i get the same effect on both. Adding more loras make the server crash earlier. I tried switching to 1.5 models after each XL gens but it seems to be mostly placebo. tried switching off xformers but no results either.

It pretty much always ends with an error message or with just "press any key to continue..." cmd_EjESY1zMy4 firefox_JPd8HggDnf

I can attest to that. The same thing is happening to me. Same problem on version 1.8.0, the failures are less but after a few generations there is still a RAM error. I found out that the problem lies in the command --medvram, it frees the video memory and very heavily loads the RAM, while the VRAM barely uses 6.5 gb (out of 8 in my case). If you remove this command, the generation takes a very long time. I don't know what's broken in the latest webui updates, but it's a fact. There is a leak of RAM when it is used at 100%, and if you use LoRA it happens even faster.

aearone avatar Apr 28 '24 19:04 aearone

Noticed that this memory leak is not triggered during X/Y/Z script runs

nailz420 avatar Apr 29 '24 20:04 nailz420

Ok i may have a workaround that limits the amount of crashes, while using Kohya i had the same issue an it resolved both, mind you it still fills your ram to an absurd amount (31.9/32 for me) it seems to just avoid the final crash when it tries to add more to it

Disabling system memory fallback helps by stopping the gpu from dumping the excess of VRAM into RAM (it might even speed up your 1.5 gens if you havent done it in the past) Theres still something filling 99.9% of the system RAM, and SD still redlines all the time but at least the gpu doesnt dump more into that 0.1%

https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion Just be sure to do it on the right python, the venv or the main system one depending on which one your SD uses.

JeffreyLebino avatar May 01 '24 10:05 JeffreyLebino

Noticed that this memory leak is not triggered during X/Y/Z script runs

Running X/Y/Z plot script actually clears out the VM

nailz420 avatar May 01 '24 21:05 nailz420

Ok i may have a workaround that limits the amount of crashes, while using Kohya i had the same issue an it resolved both, mind you it still fills your ram to an absurd amount (31.9/32 for me) it seems to just avoid the final crash when it tries to add more to it

Disabling system memory fallback helps by stopping the gpu from dumping the excess of VRAM into RAM (it might even speed up your 1.5 gens if you havent done it in the past) Theres still something filling 99.9% of the system RAM, and SD still redlines all the time but at least the gpu doesnt dump more into that 0.1%

https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion Just be sure to do it on the right python, the venv or the main system one depending on which one your SD uses.

You're supposed to do that for SD regardless of this bug, especially with 8gb VRAM or less

nailz420 avatar May 02 '24 14:05 nailz420

You're supposed to do that for SD regardless of this bug, especially with 8gb VRAM or less

Sure but OP and i both had crashes as well as the memory leak hence why i shared it, two problems in one sort of things. I experienced way less issues with the fix. And SDXL was working fine without it on my side so far.

JeffreyLebino avatar May 02 '24 17:05 JeffreyLebino

My webUI stopped hanging up when I removed the --medvram flag as well. It cuts my speed from 6.5 it/s to 3 it/s but at least it runs.

AlanMW avatar May 05 '24 07:05 AlanMW

I use Intelligent standby list cleaner (ISLC). 8gb VRAM, 32gb RAM. I can generate for about 20-30 mins before crashing with medvram-sdxl and xformers.

nailz420 avatar May 06 '24 17:05 nailz420

I use Intelligent standby list cleaner (ISLC). 8gb VRAM, 32gb RAM. I can generate for about 20-30 mins before crashing with medvram-sdxl and xformers.

So it's a massive problem. We should do something about it and draw the contributors attention to it.

aearone avatar May 07 '24 04:05 aearone