stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

Consistently hangs after 6-7 minutes since yesterday

Open ifeelrobbed opened this issue 2 years ago • 50 comments

Describe the bug Consistently hangs after 6-7 minutes since yesterday (10/15). Hopping on the command line the process is shown as killed. This happens both starting with webui.sh and launch.py.

To Reproduce Steps to reproduce the behavior: 6 - 7 minutes of activity in the web UI. The UI hangs and eventually the process is killed.

Expected behavior Not hang?

Screenshots Screenshot_20221016_083554

Desktop (please complete the following information):

  • OS: Pop OS 22.04
  • Browser: Firefox
  • Commit revision: fc220a51cf5bb5bfca83322c16e907a18ec59f6b

ifeelrobbed avatar Oct 16 '22 13:10 ifeelrobbed

Memory leak maybe?

From /var/log/syslog:

Oct 16 12:28:53 pop-os kernel: [69600.171513] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-36.scope,task=python3.10,pid=48807,uid=1000 Oct 16 12:28:53 pop-os kernel: [69600.171634] Out of memory: Killed process 48807 (python3.10) total-vm:32734412kB, anon-rss:13482224kB, file-rss:65752kB, shmem-rss:14340kB, UID:1000 pgtables:37700kB oom_score_adj:0 Oct 16 12:28:55 pop-os systemd[1]: session-36.scope: A process of this unit has been killed by the OOM killer.

ifeelrobbed avatar Oct 16 '22 17:10 ifeelrobbed

Watching memory climb as I run it. Form restart to crash, with a little of the middle not in the screenshots.

Screenshot_20221016-125730_JuiceSSH Screenshot_20221016-125744_JuiceSSH

ifeelrobbed avatar Oct 16 '22 18:10 ifeelrobbed

I had the same problem after update (https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2782). Restarted the computer and it seems to work fine. Try to restart!

(I have 32GB of memory and I don't think memory is a problem, never hit the limit).

jnpatrick99 avatar Oct 16 '22 18:10 jnpatrick99

Same issue here since about 2 days. Running native on Ubuntu. Sometimes the whole PC freezes completely and I have to hard reset. Sometimes it freezes for up to 40sec and at times when I keep the console as active window I get the same error output as yours and can restart the webui. 32GB Ram, i5-12600k, RX 6650 XT.

Edit: It has either been fixed or is related to "Radeon Profile" on Linux. No freezes since my last restart without radeon profile active. Edit2: Spoke one second before disaster. PC crashed again after the run after the first edit. Not Radeon Profile related and not fixed yet.

Chilluminati91 avatar Oct 16 '22 18:10 Chilluminati91

Unfortunately rebooting didn't seem to change anything.

ifeelrobbed avatar Oct 16 '22 21:10 ifeelrobbed

Unfortunately rebooting didn't seem to change anything.

Did you update Gradio and other stuff? Seems recent updates require new versions of libraries. pip install pip-upgrader and then pip-upgrade, it will update python dependancies from new requirements.txt)

jnpatrick99 avatar Oct 16 '22 23:10 jnpatrick99

Unfortunately rebooting didn't seem to change anything.

Did you update Gradio and other stuff? Seems recent updates require new versions of libraries. pip install pip-upgrader and then pip-upgrade, it will update python dependancies from new requirements.txt)

Went through those steps. Gradio was already up to date. It did update 3 others: fairscale,timm,transformers

Screenshot_20221016_190100

Still maxed out memory and was killed.

ifeelrobbed avatar Oct 17 '22 00:10 ifeelrobbed

possibly a memory leak, as for prevention I need to create a dynamic swapfiles up to 10GB on my system

YudhaDev avatar Oct 17 '22 02:10 YudhaDev

Same here, there is some memory leak, probably introduced day 14-16, older commits don't have that issue.

The memory increases right after the generation of the batches starts, keeps the same memory usage during the generation, and increases again when starts the next batches by clicking in the generate button.

jn-jairo avatar Oct 17 '22 07:10 jn-jairo

Yes, same problem for me, it can eat up ~1gb of ram per generation, which is never returned to the system, so a lot of shutting down Stable Diffusion and restarting it in order to reclaim said ram becomes a necessity.

Running on a RTX 3060 12gb, 32gb ram, Arch Linux

futurevessel avatar Oct 17 '22 09:10 futurevessel

Dang. Got excited when I saw the commit fix bug for latest model merge RAM improvement

However, I still maxed out memory, swap, and the process was killed after ~6 minutes.

ifeelrobbed avatar Oct 17 '22 13:10 ifeelrobbed

Having the same issue, after some time generating the process wil die with "webui.sh:; line 141 (pid) killed" syslog: Oct 18 21:43:09 DraxPC kernel: [ 5364.528101] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-688059d0-5100-4a81-983d-15c959b6b48a.scope,task=python3,pid=5445,uid=1000 Oct 18 21:43:09 DraxPC kernel: [ 5364.528182] Out of memory: Killed process 5445 (python3) total-vm:30828808kB, anon-rss:13574820kB, file-rss:70656kB, shmem-rss:14340kB, UID:1000 pgtables:35620kB oom_score_adj:0 Oct 18 21:43:09 DraxPC systemd[1]: [email protected]: A process of this unit has been killed by the OOM killer. Oct 18 21:43:09 DraxPC systemd[1163]: vte-spawn-688059d0-5100-4a81-983d-15c959b6b48a.scope: A process of this unit has been killed by the OOM killer.

Ryzen 5600x 16gb ram GTX 1650 4gb vram Linux mint 21.whatever

drax-xard avatar Oct 19 '22 01:10 drax-xard

I found the problem, it is the gradio 3.5, the leak starts in the commit 4ed99d599640bb86bc793aa3cbed31c6d0bd6957, downgrading the gradio back to 3.4.1 solves the leak, I don't know what other changes was made because of gradio 3.5 that can break by downgrading but it is working good for me with the downgrading so far.

What do you think @AUTOMATIC1111 can you check it out?

jn-jairo avatar Oct 19 '22 03:10 jn-jairo

How would one go about downgrading gradio for the time being?

drax-xard avatar Oct 19 '22 16:10 drax-xard

i have this problem when i use --medvram (ram fills up and then swap until the system crash), but not when i don't

leandrodreamer avatar Oct 19 '22 19:10 leandrodreamer

Interesting. I'm using the following arguments: --medvram --opt-split-attention --force-enable-xformers

ifeelrobbed avatar Oct 19 '22 19:10 ifeelrobbed

i have this problem when i use --medvram (ram fills up and then swap until the system crash), but not when i don't

lowvram and medvram offloads the model parts to cpu when not being used by the gpu, so using it will use more ram and less vram, it doesn't leak the memory, but you will need more ram.

jn-jairo avatar Oct 19 '22 19:10 jn-jairo

i have this problem when i use --medvram (ram fills up and then swap until the system crash), but not when i don't

lowvram and medvram offloads the model parts to cpu when not being used by the gpu, so using it will use more ram and less vram, it doesn't leak the memory, but you will need more ram.

hm but when i start (and on the first generations) i have quite a lot of free ram (abt 6gb plus 10swap), for every image generated it takes a little bit and after like 50 images it fills, if it didn't leak it should stay around the same ram usage and not build up over time

leandrodreamer avatar Oct 19 '22 19:10 leandrodreamer

i have this problem when i use --medvram (ram fills up and then swap until the system crash), but not when i don't

lowvram and medvram offloads the model parts to cpu when not being used by the gpu, so using it will use more ram and less vram, it doesn't leak the memory, but you will need more ram.

hm but when i start (and on the first generations) i have quite a lot of free ram (abt 6gb plus 10swap), for every image generated it takes a little bit and after like 50 images it fills, if it didn't leak it should stay around the same ram usage and not build up over time

@leandrodreamer as I said previously (https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2858#issuecomment-1283381801) I found it is because the gradio upgrade and downgrading it removes the leak.

I didn't say there is no leak, I said that by using lowvram/medvram you will use more RAM than without it, so the increase in memory due to lowvram/medvram is not a leak, it is supposed to happen.

I didn't identify any leak related to those options.

jn-jairo avatar Oct 19 '22 21:10 jn-jairo

oh got it, what i find strange is that i don't have any leak problems without the --medvram param, i can make hundreds of images no problem (without downgrading gardio), maybe a mix of the new gardio version and that param?, or maybe i have a completly diferent problem here idk :b

leandrodreamer avatar Oct 19 '22 22:10 leandrodreamer

@leandrodreamer yes it may be a mix of settings, you can try to revert the commit 4ed99d599640bb86bc793aa3cbed31c6d0bd6957 to test if your problem is the same I identified or something else.

jn-jairo avatar Oct 19 '22 22:10 jn-jairo

I just deactivated and deleted venv, reverted to 7d6042b908c064774ee10961309d396eabdc6c4a, which is the last commit before Gradio 3.5, commented out the line in webui.sh that performs git pull and let it just reinstall everything. Memory usage is steady and I am generating images just fine again.

ifeelrobbed avatar Oct 20 '22 04:10 ifeelrobbed

Alright, after a day of no issues, I performed a git pull, modified requirements.txt and requirements_versions.txt back to gradio==3.4.1, and commented out the git pull line in webui.sh. So far so good. The only change from the latest commit should be the gradio downgrade and memory usage is steady.

ifeelrobbed avatar Oct 21 '22 13:10 ifeelrobbed

Havent had issues since yesterday evening, seems to be fixed.

Chilluminati91 avatar Oct 21 '22 15:10 Chilluminati91

Still had the same problem, nothing changed after latest git pull. Decided to reinstall from scratch, and lo and behold, no more memory leaks.

futurevessel avatar Oct 21 '22 18:10 futurevessel

Still had the same problem, nothing changed after latest git pull. Decided to reinstall from scratch, and lo and behold, no more memory leaks.

Sadly it didn't worked for me, I reinstalled everything and the leak persists with the last master commit.

jn-jairo avatar Oct 22 '22 00:10 jn-jairo

Ok, so I ran automatic1111 through this docker image: https://github.com/AbdBarho/stable-diffusion-webui-docker

And it had the same problem for me, eating ram. So I went back to compare my previous installation of automatic1111 (I backed it up when I reinstalled) and the only difference was that in webui-user.sh, I had the --medvram parameter

So I edited the docker-compose.yml in the docker image and removed --medvram, and now there are no more leaks, so I added --medvram to my reinstalled local version and it leaks memory again. So for me, just like leandrodreamer stated in this thread, --medvram is the culprit.

Now I have 12gb VRAM, so not being able to use --medvram isn't that much of a problem, but for those with less VRAM, not being able to use it might be a pain or even make it impossible to run ?

futurevessel avatar Oct 22 '22 13:10 futurevessel

Yeah, with my 2060 I have to use --medvram for it to work at all. The only way I've found to prevent the memory leak regardless of commit I revert to is to force Gradio 3.4.1.

ifeelrobbed avatar Oct 22 '22 14:10 ifeelrobbed

Same thing happening to me. Manually downgrading gradio to 3.4.1 via pip seems to fix this problem.

Running in docker on linux, 32gb system ram, rx580 4gb.

floorcat avatar Oct 23 '22 23:10 floorcat

Is this an issue in gradio (upstream) or an issue with how this repo uses gradio?

Downgrading gradio apparently fixes the issue, which strongly suggests that the issue is upstream.

slix avatar Oct 26 '22 01:10 slix