sd-webui-controlnet icon indicating copy to clipboard operation
sd-webui-controlnet copied to clipboard

4GB gpu works?

Open lllyasviel opened this issue 2 years ago • 5 comments

Hey mikubill, I'm trying to understand why a1111 requires so little gpu. The model itself is 5.7G and a simple read is already OOM. How is 4GB optimized?

lllyasviel avatar Feb 14 '23 06:02 lllyasviel

Since model could be split into base_model(~4G) + ControlNet_model(~1.6G/700M), extension itself only reads the ControlNet part and WebUI will handle the rest. On lowvram/medvram mode WebUI will read base model part-by-part so 4gb should be enough, though performance drains.

—medvram: Makes the Stable Diffusion model consume less VRAM by splitting it into three parts - cond (for transforming text into numerical representation), first_stage (for converting a picture into latent space and back), and unet (for actual denoising of latent space) and making it so that only one is in VRAM at all times, sending others to CPU RAM. Lowers performance, but only by a bit - except if live previews are enabled.

—lowvram: An even more thorough optimization of the above, splitting unet into many modules, and only one module is kept in VRAM. Devastating for performance.

Mikubill avatar Feb 14 '23 06:02 Mikubill

interesting. let me see can i steal some code of that lowvram tomorrow

lllyasviel avatar Feb 14 '23 06:02 lllyasviel

--lowvram optimization is an idea by https://github.com/basujindal/stable-diffusion

~~when combined with --xformers (saves up to half the memory) it's possible.~~

~~I don't think we're using --medvram (different optimization)?~~

~~you lose some speed, my machine goes from 4.1it/s to 1.1s/it at 512x512~~

4gb, xformers optimization, and controlnet-lowvram option alone is enough to make up to SD2 sized images. Adding webui's-lowvram, up to 1344x1280, at 7.5s/it

ClashSAN avatar Feb 14 '23 07:02 ClashSAN

@lllyasviel I don't know if you need this info, but these optimizations may also be used for training. --medvram and --xformers are two that have worked to reduce memory requirements for users.

https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/4868?sort=new https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/4056

ClashSAN avatar Feb 14 '23 12:02 ClashSAN

I have 1050Ti 4GB and it 768x512 works without controlnet. With controlnet and medvram (even with Low VRAM option checked), the the moment VRAM hits 4096GB, whole torch times-out and crashes the gpu driver (this happens even on normal generation whenever trying to render anything higher than 768x768). I get this error whenever the VRAM hits the max for whatever reason. Seems like webui tries to load both controlnet and the model at the same time which results into this error.

Not sure what Low VRAM option does exactly but to make things work without any issue, it would be great if ControlNET did its job and then unloaded from memory before the actual model loads. Not sure if that's possible.

Edit: --lowvram command line option in webui works. ControlNET really needs about ~1.5GB more and with -medvram only (even if it works without it), the Auto1111 tries to load that up on top of already loaded ~3.5GB or so which is the reason of cuda timeout and driver restart.

Mich-666 avatar Feb 26 '23 16:02 Mich-666