stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

How to allocate memory from 2nd GPU?

Open aeon3 opened this issue 2 years ago • 16 comments

Here the error I have run into:

"RuntimeError: CUDA out of memory. Tried to allocate 18.00 GiB (GPU 0; 24.00 GiB total capacity; 20.51 GiB already allocated; 618.87 MiB free; 20.59 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF"

I have a 2nd GPU which could be used to allocate that extra 18GB, however I need help in figuring out how to show SD there is a 2nd GPU present.

Any thoughts?

aeon3 avatar Sep 08 '22 13:09 aeon3

Using memory from between two GPUs is not simple. I only have one so I can't research/develop this.

AUTOMATIC1111 avatar Sep 08 '22 20:09 AUTOMATIC1111

Using memory from between two GPUs is not simple. I only have one so I can't research/develop this.

Oh hi. Well I have mine linked with Nvlink, I thought that would make it a breeze to benefit from memory pooling. I guess it is not that different from having 2 unlinked GPUs afterall?

aeon3 avatar Sep 08 '22 21:09 aeon3

Would be interested in this as well. I don't think something like SLI is the answer though. Even distributing the batch or iterations across available GPUs.

dev-greene avatar Sep 09 '22 13:09 dev-greene

Found this guy talking about it here: https://youtu.be/hBKcL8fNZ18?list=PLzSRtos7-PQRCskmdrgtMYIt_bKEbMPfD&t=481

Not sure if it's helpful or not but he shows some code

aeon3 avatar Sep 09 '22 14:09 aeon3

This is the most intuitive and complete webui fork. It would be amazing if this could be implemented here:

NickLucche/stable-diffusion-nvidia-docker#8

Potential do double image output even with the same VRAM is awesome.

from #311

mchaker avatar Sep 23 '22 01:09 mchaker

For more than just 2 GPUs, NickLucche has code:

I imagine you're really busy with all the requests and bugs, but if you have 5 minutes, have a look at this file on Nickluche's project:

https://github.com/NickLucche/stable-diffusion-nvidia-docker/blob/master/parallel.py

He apparently generated an external wrapper to call the application, allowing it to query if there are or not multi-gpus, and in case there are, data parallel comes into play.

mchaker avatar Sep 23 '22 01:09 mchaker

Hi! I could probably port this multi-gpu feature, but I would appreciate some pointers as to where in the code I should look for the actual model (I am using the vanilla one from huggingface). Easiest mode would be implementing a ~data parallel approach, in which we have one model per GPU and you distribute the workload among them. Given the amount of features this repo provides I think it could take some time to have em all supported in the parallel version. Let me know your thoughts on this.

NickLucche avatar Oct 02 '22 08:10 NickLucche

Hi! I could probably port this multi-gpu feature, but I would appreciate some pointers as to where in the code I should look for the actual model (I am using the vanilla one from huggingface).

Easiest mode would be implementing a ~data parallel approach, in which we have one model per GPU and you distribute the workload among them.

Given the amount of features this repo provides I think it could take some time to have em all supported in the parallel version.

Let me know your thoughts on this.

Is this something still in the works? I understand it could take a while to make everything support multiple GPU, but if I could use both of my GPU to generate images, that would be good enough. Like, if I select a batch of 2, each GPU would do one. If I did 8, each would do 4.

Is that complicated?

swcrazyfan avatar Oct 25 '22 04:10 swcrazyfan

@swcrazyfan you can already load two instances at the same time. https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/3377

Just use --device-id 0 in one and --device-id 1 in the other. Also --port some_port_number with a different port for each instance.

Of course it is not an optimal solution and you might need more RAM to run both instances. --lowram might help too.

Extraltodeus avatar Oct 26 '22 16:10 Extraltodeus

Is this being worked upon? It sounds like an awesome feature. Even if it's restricted to txt2img, it'd be a start.

I guess this would require major changes to the way images are handled right now, there'd probably would need to be a queue of sorts to make this work.

precompute avatar Nov 01 '22 09:11 precompute

Hi! I could probably port this multi-gpu feature, but I would appreciate some pointers as to where in the code I should look for the actual model (I am using the vanilla one from huggingface). Easiest mode would be implementing a ~data parallel approach, in which we have one model per GPU and you distribute the workload among them. Given the amount of features this repo provides I think it could take some time to have em all supported in the parallel version. Let me know your thoughts on this.

I'd be happy to help test this if it's something that's being worked on. I'm currently running an 11x RTX3090 server for a Discord Community using @Extraltodeus 's --device-id feature https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/3377, and I think that having some parallelism would further benefit the community greatly. I'm not sure if it's ok to mention community links here, but info is in my profile, and you're welcome to DM me on Discord if it's something you would like help testing.

Lukium avatar Nov 07 '22 10:11 Lukium

Just popping in to check on this. I also have an 8x 3090 machine and a 2x3090 machine (both have 256GB RAM) that would be great for testing parallelization.

Omegadarling avatar Nov 28 '22 09:11 Omegadarling

This would be a really great feature. Just being able to distribute a batch would be great,

Having a round-robin for "next GPU" would also be useful to distribute web requests across a pool of GPUs.

zeigerpuppy avatar Jan 17 '23 00:01 zeigerpuppy

p.s. I think this issue has changed a bit from a memory question to a multi-GPU support question in general. It may be good to alter the title to something like: "Multi GPU support for parallel queries". I think that is somewhat distinct from the first query regarding memory pooling (which is a much more difficult ask!)

zeigerpuppy avatar Jan 17 '23 00:01 zeigerpuppy

Using memory from between two GPUs is not simple. I only have one so I can't research/develop this.

well let's get it funded then

hananbeer avatar Mar 08 '23 04:03 hananbeer

I'm not sure this is really a parallel query question though, is it? I found it while looking for using multiple GPUs for a single query, and most of the discussion was based on that.

moxSedai avatar Apr 14 '23 04:04 moxSedai