candle icon indicating copy to clipboard operation
candle copied to clipboard

Stable Diffusion 3.5 Large CUDA OUT_OF_MEMORY on RTX 3090

Open danielclough opened this issue 1 year ago • 12 comments

When I run cargo run --example stable-diffusion-3 --release --features=cuda -- --which 3.5-large --prompt "pretty picture" I am get Error: DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory") with Stable Diffusion 3.5 Large and Turbo.

According to this chart from stability.ai they should run on an RTX 3090.

chart

danielclough avatar Nov 05 '24 08:11 danielclough

That seems odd, we made a couple optimizations to memory usage following #2574 and in the end, SD 3.5 large was reported to work well on a GPU with only 20GB of memory. Maybe there are some other processes using the memory? If not it would be good to run a nsys profile to see when the memory is being used.

LaurentMazare avatar Nov 05 '24 08:11 LaurentMazare

There are no other processes running.

How would you like me to run nsys?

Here's some system info:

cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.1 LTS"

rustc --version
rustc 1.81.0 (eeb90cda1 2024-09-04)

cargo --version
cargo 1.81.0 (2dbb1af80 2024-08-20)

NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6
...

danielclough avatar Nov 05 '24 09:11 danielclough

have you tried it with cudnn in addition to cuda feafture. I found it used less ram when cudnn was enabled.

AlpineVibrations avatar Nov 06 '24 21:11 AlpineVibrations

have you tried it with cudnn in addition to cuda feafture. I found it used less ram when cudnn was enabled.

I have not.

@LaurentMazare Should cudnn be required to run it properly?

danielclough avatar Nov 12 '24 04:11 danielclough

cudnn shouldn't be necessary but might indeed help reduce gpu memory usage. That said, I can running the command you mentioned only results in using ~20GB of memory in my case so my guess is that something else is off there. 20241112-mem

LaurentMazare avatar Nov 12 '24 07:11 LaurentMazare

The GPU doesn't actually fill up all the memory.

image

Any suggestions for how to troubleshoot this would be welcome.

danielclough avatar Nov 12 '24 11:11 danielclough

Not sure how much I would trust the memory usage reported by some external tool (especially here where it seems to only measure memory usage every 10s), it's probably safer to use nsys to get a proper memory profile.

LaurentMazare avatar Nov 12 '24 11:11 LaurentMazare

are you unable to run it with cudnn. It really did help and my ADA4000 with 20GB won't run SD35L without it. Also I would also recommend the nsys for monitoring.

AlpineVibrations avatar Nov 14 '24 23:11 AlpineVibrations

Unless it is supposed to require cudnn I am not interested in the workaround.

This isn't something that is important to me, so I don't know if I will make time to troubleshoot it without hand holding.

Feel free to close the issue if cudnn is supposed to be required.

Otherwise, I guess someone else will care enough to troubleshoot.

danielclough avatar Nov 15 '24 00:11 danielclough

that chart you showed is for 3Medium you are trying to load 3.5 Large. how much memory is on your video card?

AlpineVibrations avatar Nov 15 '24 20:11 AlpineVibrations

Hey there! I’m having the same issue. I’m using an RTX 4090 and Ubuntu LTS 24. I just tried using Hugging Face and Diffusers library. I think it might be related to PyTorch, but I’m not entirely sure. Can you help?

eademir avatar Jan 06 '25 18:01 eademir

Hi having same issues with rtx 4090, I try to running it with diffusers library and it says out of memory, but when I tried to run in comfy ui it take just 12gb of vram which half of maximum memory. Idk why this happen thou

NaufalF121 avatar Oct 12 '25 05:10 NaufalF121