sd-webui-controlnet
sd-webui-controlnet copied to clipboard
[Bug]: RAM Memory leak issue - RAM consumption keeps increasing
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits of both this extension and the webui
What happened?
When trying to use the extension without any model cache being switched on, after 10 tries, the SSH connection to my EC2 instance fails. This always happens during build_controlnet_model function, because the output : Loading model {model} gets printed but the loading state_dict does not.
Steps to reproduce the problem
- Use AWS EC2 instance
- I am running bash webui.sh with which the entire program starts and set model cache off, apply settings and restart without any args
- Generate images using img2img roughly 10 times
- Somewhere inbetween the SSH connection gets disconneted exactly after stating loading model [model name] - this is not limited to one model, happens across all models.
What should have happened?
The SSH connection should not have gotten closed. There seems to be some error here. It happens suddenly as well without any proper reproducible number of steps. But it does fail. It doesn't fail for normal automatic1111, and is a controlnet web-ui problem
Commit where the problem happens
webui: [22bcc7be] controlnet: 2270f364e167b9531daf9a8bd1d62cb2dbfa4d00
What browsers do you use to access the UI ?
Google Chrome
Command Line Arguments
No
Console logs
Usually it is supposed to be:
Loading model: control_v11p_sd15_inpaint [ebff9138]
Loaded state_dict from [/home/ec2-user/stable-diffusion-webui/models/ControlNet/control_v11p_sd15_inpaint.pth]
Loading config: /home/ec2-user/stable-diffusion-webui/extensions/sd-webui-controlnet/models/control_v11p_sd15_inpaint.yaml
ControlNet model control_v11p_sd15_inpaint [ebff9138] loaded.
But while it fails it stops at:
Loading model: control_v11p_sd15_inpaint [ebff9138]
The rest of the log isn't visible and SSH gets disconnected.
Additional information
Happens with model cache as well. But after many more tries
can you track your memory use
I did. I still had close to 11GB of VRAM available.
Adding the type of log which I got:
==============NVSMI LOG==============
Timestamp : Sun Apr 23 14:05:44 2023 Driver Version : 515.65.01 CUDA Version : 11.7
Attached GPUs : 1 GPU 00000000:00:1E.0 FB Memory Usage Total : 15360 MiB Reserved : 388 MiB Used : 7109 MiB Free : 7861 MiB BAR1 Memory Usage Total : 256 MiB Used : 5 MiB Free : 251 MiB
currently the program is running. And I test every 5 seconds. When the SSH connection got lost, the Free Memory was : ~11500 MiB
please track your memory use, not GPU memory use.
@lllyasviel SSH Failed Again when running from model cache. The prompt and other input params are below. I ran the same inputs with different seeds and it failed the 14th time this time around. This is with cache on.
Handsome Indian man wearing red colour specs Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 562784315, Size: 512x512, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.75, Mask blur: 4, ControlNet-0 Enabled: True, ControlNet-0 Module: depth_zoe, ControlNet-0 Model: control_v11f1p_sd15_depth [cfd03158], ControlNet-0 Weight: 1, ControlNet-0 Guidance Start: 0, ControlNet-0 Guidance End: 1, ControlNet-1 Enabled: True, ControlNet-1 Module: canny, ControlNet-1 Model: control_v11p_sd15_canny [d14c016b], ControlNet-1 Weight: 1, ControlNet-1 Guidance Start: 0, ControlNet-1 Guidance End: 1, ControlNet-2 Enabled: True, ControlNet-2 Module: softedge_pidinet, ControlNet-2 Model: control_v11p_sd15_softedge [a8575a2a], ControlNet-2 Weight: 1, ControlNet-2 Guidance Start: 0, ControlNet-2 Guidance End: 1
processing | 138.8/7.0s Time taken: 27.07sTorch active/reserved: 5469/5814 MiB, Sys VRAM: 6836/14972 MiB (45.66%)
Kept running the same set of models and param again and again. Then it failed at 14th try.
Console Log:
Loading model from cache: control_v11f1p_sd15_depth [cfd03158]██████████████| 16/16 [00:22<00:00, 1.48s/it]
Loading preprocessor: depth_zoe
Pixel Perfect Mode Enabled.
resize_mode = ResizeMode.RESIZE
raw_H = 585
raw_W = 585
target_H = 512
target_W = 512
estimation = 512.0
preprocessor resolution = 512
Loading model from cache: control_v11p_sd15_canny [d14c016b]
Loading preprocessor: canny
preprocessor resolution = 512
Loading model from cache: control_v11p_sd15_softedge [a8575a2a]
Loading preprocessor: pidinet
Pixel Perfect Mode Enabled.
resize_mode = ResizeMode.RESIZE
raw_H = 585
raw_W = 585
target_H = 512
target_W = 512
estimation = 512.0
preprocessor resolution = 512
0%| | 0/16 [00:00<?, ?it/s]
Okay, when you say memory use do you mean systems RAM and total amount of space in it?
yes, RAM
@lllyasviel you were right. There is an issue with the RAM. I have attached a screenshot of the mem usage below. Is there any way to fix this? I am not running any other program other than controlnet + automatic1111. Basically, everytime, I click generate the RAM usage increases. After it reached 90%, I tried once again and it failed. Please, let me know how to proceed. Thanks. I am using a g4dn instance so it has 16 GB RAM

it seems a memory leak in some preprocessor
If you have any idea on how to solve this please let me know, I will try to see if it fixes the issue.
I am also going to conduct some tests with different processors and see which might be the issue. I will also try without any processors and see if that doesn't lead to constant increase in RAM usage. Will get back with those results.
@lllyasviel It has something to do with Low VRAM setting as per my initial observation. I generated images back to back for 10 mins with canny edge model and preprocessor. It had a very stable memory consumption. All the images below have the same time axis on X.

And then I switched on the Low VRAM setting and got the following memory consumption.

I had very similar behaviour with canny + depth_zoe when lowVRAM was switch on for depth Zoe.

I will be checking again with other processors without VRAM setting and check if the RAM consumption is stable. Let me know your views.
thanks for the data we will take a look soon
just an idea: I had previously reported that for clip_vision there was a memory leak due to not using "with torch.no_grad() :". This was for CN1.0, I'm not sure if it's already added, may apply to other annotators.
@tkalayci71 can you explain where this might need to be added for me to check? The file or folder or more information would prove very useful.
@lllyasviel I have been running the code for more than an hour without low VRAM setting and till now, there has been no issue at all with random increase in system RAM consumption. Strongly feel this might be an implementation issue of Low VRAM.
@ghpkishore it's in annotator / clip / init py, apply function last 3 lines need to be wrapped inside torch.no_grad, but I wouldn't recommend modifying code, they'll probably will solve it soon.
@lllyasviel by the way, see also: https://github.com/huggingface/transformers/issues/20636
@lllyasviel similar to how the program is killed if VRAM gets over, is it possible to add a check for system RAM as well. As in incase if the system RAM exceeds 95% kill the program?
I wrote here that even if I turned off Controlnet, clearing the Enabled checkbox does not delete the model from VRAM, it still has +3 GB! https://github.com/vladmandic/automatic/discussions/386#discussioncomment-5762338
I do not think cn has vram leak problem. If pytorch moves model out of gpu, it will not clear the vram - it just marks those vram as unoccupied and all other codes can use those vram even if those vram looks occupied in OS monitor. but cn may have some ram issue and we will take a look considering our workloads
a possible test would be to mock the pytorch modules so that they perform no-ops, or, basic trivial operations, while measuring memory use.
if we can measure the memory use while using pytorch in a profiler and then, measure it when mocking pytorch, it would possibly help. but i don't know enough about the internals, to pull this off.
Hi,
I created highly optimized ControlNet v1.1.232 version. You can use this version with 4GB VRAM with max 2 Multi ControlNet and Hires. fix. All added and changed parts signed with "Hikmet Koyuncu".
Extract "webui" directory on your AUTOMATIC1111 "webui" directory and overwrite files.
You must firstly convert your ControlNet preprocessor and ControlNet models to fp16 format.
For ControlNet models you can use my edited "extract_controlnet.py" file. You must use "--half" and "--convert" arguments.
For ControlNet proprecessors (annotator) you can use "convert_controlnet_preprocessor_fp16.py" file.
Example:
python.exe "convert_controlnet_preprocessor_fp16.py" --src "myPreprocessor.pth" --dst "myPreprocessor_fp16.pth"
Link: https://www.mediafire.com/file/ihjr4gcg2wy2fm1/Optimized_ControlNet_v1.1.232_by_Hikmet_Koyuncu.zip/file
@lllyasviel, I've been having RAM problems for a long time, and recently it became quite serious since I increased the number of controlNet modules. Specifically,
- in the past when webui only used 2 modules (
inpaint
,depth
), after about 17 hours of continuously creating images, my server would be full of RAM (32G RAM). - Since yesterday, I added the
scribble
module (total:inpaint
,depth
,scribble
), then after about 2 hours of continuously creating images, my server will be full of RAM (32G RAM).
I tried adding 10G of swap memory
, but it's still full RAM soon.
This is a metric that tracks RAM usage in percentage in last 7 days:
P/s: I'm pretty sure the problem is controlNet, because I have another server that doesn't use controlNet, it always creates images continuously but still doesn't have the problem of full RAM.
ControlNet load models in the VRAM but does not remove. And each time your VRAM usage increase. I published fixed version.
ControlNet load models in the VRAM but does not remove. And each time your VRAM usage increase. I published fixed version.
Hi @hikmet-koyuncu, My VRAM is fine, but RAM is not in my case. As the title of this issue, this is a RAM problem. thanks
Yes. ControlNet move some models VRAM to RAM (some models not, it is a bug) after image creation, but never remove. I fixed this problem.
Yes. ControlNet move some models VRAM to RAM (some models not, it is a bug) after image creation, but never remove. I fixed this problem.
Hi @hikmet-koyuncu, after updating control extension to https://github.com/Mikubill/sd-webui-controlnet/commit/fce6775a6dddef52ecd658259e909687d9dedf72, the memory leak issue is still not resolved. More specifically on how I use ControlNet via API:
- GPU A10 24G VRAM, 32G RAM.
- generate ~30k images per day.
- Inpaint:
"alwayson_scripts": {
"controlnet": {
"args": [
{
"module": "inpaint_only",
"model": "control_v11p_sd15_inpaint [ebff9138]",
"control_mode": "ControlNet is more important"
}
]
}
}
- Depth:
"alwayson_scripts": {
{
"controlnet": {
"args": [
{
"module": "depth",
"model": "control_v11f1p_sd15_depth [cfd03158]",
"control_mode": "ControlNet is more important"
}
]
}
}
}
- scribble:
"alwayson_scripts": {
{
"controlnet": {
"args": [
{
"module": "none",
"model": "control_v11p_sd15_scribble [d4ba51ff]",
"control_mode": "ControlNet is more important"
}
]
}
}
}
logs:
023-11-07 04:17:03,611 - ControlNet - INFO - Loading model from cache: control_v11p_sd15_inpaint [ebff9138]
2023-11-07 04:17:03,620 - ControlNet - WARNING - A1111 inpaint and ControlNet inpaint duplicated. ControlNet support enabled.
2023-11-07 04:17:03,621 - ControlNet - INFO - Loading preprocessor: inpaint
2023-11-07 04:17:03,621 - ControlNet - INFO - preprocessor resolution = -1
2023-11-07 04:17:03,689 - ControlNet - INFO - ControlNet Hooked - Time = 0.0996100902557373
100%|██████████| 22/22 [00:02<00:00, 8.60it/s]
Total progress: 100%|██████████| 22/22 [00:02<00:00, 7.91it/s]
2023-11-07 04:17:10,561 - ControlNet - INFO - Loading model: control_v11p_sd15_scribble [d4ba51ff]
2023-11-07 04:17:15,289 - ControlNet - INFO - Loaded state_dict from [/app/extensions/sd-webui-controlnet/models/control_v11p_sd15_scribble.pth]
2023-11-07 04:17:15,289 - ControlNet - INFO - controlnet_default_config
2023-11-07 04:17:17,925 - ControlNet - INFO - ControlNet model control_v11p_sd15_scribble [d4ba51ff] loaded.
2023-11-07 04:17:18,009 - ControlNet - INFO - Loading preprocessor: none
2023-11-07 04:17:18,009 - ControlNet - INFO - preprocessor resolution = -1
2023-11-07 04:17:18,039 - ControlNet - INFO - ControlNet Hooked - Time = 7.499300956726074
100%|██████████| 25/25 [00:03<00:00, 7.98it/s]
Total progress: 100%|██████████| 25/25 [00:03<00:00, 7.32it/s]
2023-11-07 04:17:21,814 - ControlNet - INFO - Loading model from cache: control_v11p_sd15_inpaint [ebff9138]
2023-11-07 04:17:21,823 - ControlNet - WARNING - A1111 inpaint and ControlNet inpaint duplicated. ControlNet support enabled.
2023-11-07 04:17:21,824 - ControlNet - INFO - Loading preprocessor: inpaint
2023-11-07 04:17:21,824 - ControlNet - INFO - preprocessor resolution = -1
2023-11-07 04:17:21,898 - ControlNet - INFO - ControlNet Hooked - Time = 0.1059107780456543
100%|██████████| 22/22 [00:02<00:00, 10.14it/s]
Total progress: 100%|██████████| 22/22 [00:02<00:00, 9.52it/s]
100%|██████████| 19/19 [00:03<00:00, 5.12it/s]0:00, 10.58it/s]
Total progress: 100%|██████████| 19/19 [00:03<00:00, 5.00it/s]
100%|██████████| 13/13 [00:00<00:00, 13.21it/s]0:00, 5.13it/s]
Total progress: 100%|██████████| 13/13 [00:01<00:00, 9.21it/s]
2023-11-07 04:17:30,698 - ControlNet - INFO - Loading model from cache: control_v11p_sd15_inpaint [ebff9138]
2023-11-07 04:17:30,706 - ControlNet - WARNING - A1111 inpaint and ControlNet inpaint duplicated. ControlNet support enabled.
2023-11-07 04:17:30,707 - ControlNet - INFO - Loading preprocessor: inpaint
2023-11-07 04:17:30,707 - ControlNet - INFO - preprocessor resolution = -1
2023-11-07 04:17:30,785 - ControlNet - INFO - ControlNet Hooked - Time = 0.11857032775878906
Hi,
I added "Broom" icon. If you click it, RAM and VRAM will be clear.
I don't want everytime clear RAM, because this can slow our workflow. When you get RAM error, then click broom icon. This clear VRAM and RAM, and print RAM and VRAM amount at this time in the DOS Console window.
And, if you using fp32 models and you have small amount of RAM, then you must use fp16 models. You can convert fp32 models to fp16 models. I shared this python program too.
I am using 16 GB RAM and 4 GB VRAM and I can use 2 ControlNet Unit same time.
Hi @hikmet-koyuncu, please make a fork or contribute to this repo and I can take a look at your code
Hi,
I don't know using GitHub too much. When I have a free time, I will learn. I can send you my edited version of "ControlNet 1.1.232". I added comment "Hikmet Koyuncu" on each changed part.
hi @hikmet-koyuncu, The code you uploaded to mediafire seems to be old (2023-07-18), and it's missing some code so I can't run your code yet. Can you upload the full update?
Yes, because I uploaded long long ago, but nobody cared this. I am still using this version.