stable-diffusion-webui
stable-diffusion-webui copied to clipboard
Consistently hangs after 6-7 minutes since yesterday
Describe the bug Consistently hangs after 6-7 minutes since yesterday (10/15). Hopping on the command line the process is shown as killed. This happens both starting with webui.sh and launch.py.
To Reproduce Steps to reproduce the behavior: 6 - 7 minutes of activity in the web UI. The UI hangs and eventually the process is killed.
Expected behavior Not hang?
Screenshots
Desktop (please complete the following information):
- OS: Pop OS 22.04
- Browser: Firefox
- Commit revision: fc220a51cf5bb5bfca83322c16e907a18ec59f6b
Memory leak maybe?
From /var/log/syslog:
Oct 16 12:28:53 pop-os kernel: [69600.171513] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-36.scope,task=python3.10,pid=48807,uid=1000 Oct 16 12:28:53 pop-os kernel: [69600.171634] Out of memory: Killed process 48807 (python3.10) total-vm:32734412kB, anon-rss:13482224kB, file-rss:65752kB, shmem-rss:14340kB, UID:1000 pgtables:37700kB oom_score_adj:0 Oct 16 12:28:55 pop-os systemd[1]: session-36.scope: A process of this unit has been killed by the OOM killer
.
Watching memory climb as I run it. Form restart to crash, with a little of the middle not in the screenshots.
I had the same problem after update (https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2782). Restarted the computer and it seems to work fine. Try to restart!
(I have 32GB of memory and I don't think memory is a problem, never hit the limit).
Same issue here since about 2 days. Running native on Ubuntu. Sometimes the whole PC freezes completely and I have to hard reset. Sometimes it freezes for up to 40sec and at times when I keep the console as active window I get the same error output as yours and can restart the webui. 32GB Ram, i5-12600k, RX 6650 XT.
Edit: It has either been fixed or is related to "Radeon Profile" on Linux. No freezes since my last restart without radeon profile active. Edit2: Spoke one second before disaster. PC crashed again after the run after the first edit. Not Radeon Profile related and not fixed yet.
Unfortunately rebooting didn't seem to change anything.
Unfortunately rebooting didn't seem to change anything.
Did you update Gradio and other stuff? Seems recent updates require new versions of libraries. pip install pip-upgrader
and then pip-upgrade
, it will update python dependancies from new requirements.txt
)
Unfortunately rebooting didn't seem to change anything.
Did you update Gradio and other stuff? Seems recent updates require new versions of libraries.
pip install pip-upgrader
and thenpip-upgrade
, it will update python dependancies from newrequirements.txt
)
Went through those steps. Gradio was already up to date. It did update 3 others: fairscale,timm,transformers
Still maxed out memory and was killed.
possibly a memory leak, as for prevention I need to create a dynamic swapfiles up to 10GB on my system
Same here, there is some memory leak, probably introduced day 14-16, older commits don't have that issue.
The memory increases right after the generation of the batches starts, keeps the same memory usage during the generation, and increases again when starts the next batches by clicking in the generate button.
Yes, same problem for me, it can eat up ~1gb of ram per generation, which is never returned to the system, so a lot of shutting down Stable Diffusion and restarting it in order to reclaim said ram becomes a necessity.
Running on a RTX 3060 12gb, 32gb ram, Arch Linux
Dang. Got excited when I saw the commit fix bug for latest model merge RAM improvement
However, I still maxed out memory, swap, and the process was killed after ~6 minutes.
Having the same issue, after some time generating the process wil die with "webui.sh:; line 141 (pid) killed"
syslog:
Oct 18 21:43:09 DraxPC kernel: [ 5364.528101] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-688059d0-5100-4a81-983d-15c959b6b48a.scope,task=python3,pid=5445,uid=1000 Oct 18 21:43:09 DraxPC kernel: [ 5364.528182] Out of memory: Killed process 5445 (python3) total-vm:30828808kB, anon-rss:13574820kB, file-rss:70656kB, shmem-rss:14340kB, UID:1000 pgtables:35620kB oom_score_adj:0 Oct 18 21:43:09 DraxPC systemd[1]: [email protected]: A process of this unit has been killed by the OOM killer. Oct 18 21:43:09 DraxPC systemd[1163]: vte-spawn-688059d0-5100-4a81-983d-15c959b6b48a.scope: A process of this unit has been killed by the OOM killer.
Ryzen 5600x 16gb ram GTX 1650 4gb vram Linux mint 21.whatever
I found the problem, it is the gradio 3.5, the leak starts in the commit 4ed99d599640bb86bc793aa3cbed31c6d0bd6957, downgrading the gradio back to 3.4.1 solves the leak, I don't know what other changes was made because of gradio 3.5 that can break by downgrading but it is working good for me with the downgrading so far.
What do you think @AUTOMATIC1111 can you check it out?
How would one go about downgrading gradio for the time being?
i have this problem when i use --medvram (ram fills up and then swap until the system crash), but not when i don't
Interesting. I'm using the following arguments:
--medvram --opt-split-attention --force-enable-xformers
i have this problem when i use --medvram (ram fills up and then swap until the system crash), but not when i don't
lowvram and medvram offloads the model parts to cpu when not being used by the gpu, so using it will use more ram and less vram, it doesn't leak the memory, but you will need more ram.
i have this problem when i use --medvram (ram fills up and then swap until the system crash), but not when i don't
lowvram and medvram offloads the model parts to cpu when not being used by the gpu, so using it will use more ram and less vram, it doesn't leak the memory, but you will need more ram.
hm but when i start (and on the first generations) i have quite a lot of free ram (abt 6gb plus 10swap), for every image generated it takes a little bit and after like 50 images it fills, if it didn't leak it should stay around the same ram usage and not build up over time
i have this problem when i use --medvram (ram fills up and then swap until the system crash), but not when i don't
lowvram and medvram offloads the model parts to cpu when not being used by the gpu, so using it will use more ram and less vram, it doesn't leak the memory, but you will need more ram.
hm but when i start (and on the first generations) i have quite a lot of free ram (abt 6gb plus 10swap), for every image generated it takes a little bit and after like 50 images it fills, if it didn't leak it should stay around the same ram usage and not build up over time
@leandrodreamer as I said previously (https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2858#issuecomment-1283381801) I found it is because the gradio upgrade and downgrading it removes the leak.
I didn't say there is no leak, I said that by using lowvram/medvram you will use more RAM than without it, so the increase in memory due to lowvram/medvram is not a leak, it is supposed to happen.
I didn't identify any leak related to those options.
oh got it, what i find strange is that i don't have any leak problems without the --medvram param, i can make hundreds of images no problem (without downgrading gardio), maybe a mix of the new gardio version and that param?, or maybe i have a completly diferent problem here idk :b
@leandrodreamer yes it may be a mix of settings, you can try to revert the commit 4ed99d599640bb86bc793aa3cbed31c6d0bd6957 to test if your problem is the same I identified or something else.
I just deactivated and deleted venv, reverted to 7d6042b908c064774ee10961309d396eabdc6c4a, which is the last commit before Gradio 3.5, commented out the line in webui.sh that performs git pull
and let it just reinstall everything. Memory usage is steady and I am generating images just fine again.
Alright, after a day of no issues, I performed a git pull, modified requirements.txt and requirements_versions.txt back to gradio==3.4.1, and commented out the git pull line in webui.sh. So far so good. The only change from the latest commit should be the gradio downgrade and memory usage is steady.
Havent had issues since yesterday evening, seems to be fixed.
Still had the same problem, nothing changed after latest git pull. Decided to reinstall from scratch, and lo and behold, no more memory leaks.
Still had the same problem, nothing changed after latest git pull. Decided to reinstall from scratch, and lo and behold, no more memory leaks.
Sadly it didn't worked for me, I reinstalled everything and the leak persists with the last master commit.
Ok, so I ran automatic1111 through this docker image: https://github.com/AbdBarho/stable-diffusion-webui-docker
And it had the same problem for me, eating ram. So I went back to compare my previous installation of automatic1111 (I backed it up when I reinstalled) and the only difference was that in webui-user.sh, I had the --medvram parameter
So I edited the docker-compose.yml in the docker image and removed --medvram, and now there are no more leaks, so I added --medvram to my reinstalled local version and it leaks memory again. So for me, just like leandrodreamer stated in this thread, --medvram is the culprit.
Now I have 12gb VRAM, so not being able to use --medvram isn't that much of a problem, but for those with less VRAM, not being able to use it might be a pain or even make it impossible to run ?
Yeah, with my 2060 I have to use --medvram for it to work at all. The only way I've found to prevent the memory leak regardless of commit I revert to is to force Gradio 3.4.1.
Same thing happening to me. Manually downgrading gradio to 3.4.1 via pip seems to fix this problem.
Running in docker on linux, 32gb system ram, rx580 4gb.
Is this an issue in gradio (upstream) or an issue with how this repo uses gradio?
Downgrading gradio apparently fixes the issue, which strongly suggests that the issue is upstream.