stable-diffusion-webui
stable-diffusion-webui copied to clipboard
[Bug]: 1.6.0 Hires. fix uses all memory on an AMD 7900 XT
Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
What happened?
I saw some reports about NVidia issues in v1.6.0, but none about AMD so far. When generating images using hires. fix, 99% (or more) VRAM is used and this can lead to either an out of memory error or my entire PC crashing.
I'm on Manjaro Linux with kernel 6.4.12, running v1.6.0 with python 3.11.3, torch 2.1.0 + rocm5.5. I also tried a fresh venv (this downloads rocm 5.6) but this issue still also happens then. Below are some screenshots showing GPU usage while generating 512x768 images with x1.5 hires fix:
v1.5.2
v1.6.0
Lower memory usage corresponds to the first pass, higher memory usage to the hires fix.
Here is the error message when I get the out of memory error:
OutOfMemoryError: HIP out of memory. Tried to allocate 5.70 GiB. GPU 0 has a total capacty of 19.98 GiB of which 5.66 GiB is free. Of the allocated memory 8.05 GiB is allocated by PyTorch, and 5.92 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF
Steps to reproduce the problem
Generate images using hires fix (batch count > 1, I generally do 8, makes the error more likely to appear). The above memory usage screenshots were obtained while generating 512x768 images upscaled x1.5 in the hires pass. I have previously (v1.5.2 and before) generated 960x540 images upscaled x2 with no issue at all using the same GPU and rocm version. I used no extension during generation.
What should have happened?
A 20 GB VRAM GPU should not run out of memory generating 512x768 upscaled x1.5, as seen in the v1.5.2 screenshots I get a maximum of 75% (about 15 GB).
Sysinfo
What browsers do you use to access the UI ?
Mozilla Firefox, Brave
Console logs
https://pastebin.com/K1vytEaw
Additional information
As I have an AMD Ryzen 9 7900X CPU, I added export ROCR_VISIBLE_DEVICES=0
to webui-user.sh to only use my GPU.
I have the same issue on apple M1 max with 32gb, hi-res destroys everything :<
Is there any news about this issue? Are other users impacted? Since 1.6.0 released over 2 months ago, I still have this issue (I just tried creating a new venv to check, downloaded the latest pytorch 2.2.0 and rocm 5.6). Any chance to find memory management issues somewhere in the code base that could explain this? I might have a try at bisecting the issue over the weekend if I have time, to see exactly what caused the problem, but I expect this to take quite a bit of time.
I ended up not bisecting, as any non-release commit crashes immediately when I try to run the webui.
However, and I don't want to jinx it, but I think I found a nice workaround in the comments of #6460, and added the following line to webui-user.sh:
export PYTORCH_HIP_ALLOC_CONF="garbage_collection_threshold:0.7,max_split_size_mb:512"
I played a bit with the 0.7 value, started with 0.9 but still got memory issues then, with 0.7 I haven't had any problem yet.
The problem goes away if InvokeAI optimization is used instead of Doggettx
The problem goes away if InvokeAI optimization is used instead of Doggettx
How do you do this exactly?
You can change the optimization method in Settings -> Optimizations (in the Stable Diffusion category, from version 1.7+). I did end up switching to InvokeAI, though I kept the above command as well, not a single problem since then (with Doggettx, it would still crash/freeze on pretty rare occurrences).
Not sure if the issue should be closed with this workaround, since it still appears to be a regression for the Doggettx optimization.
Changing optimization method didnt help for me
Still uses all my VRAM for me on SDXL, I really want to just use Automatic1111. I am not a fan of ComfyUI it's too much work imo, but it doesn't have this issue at all. What is the core of the issue? VAE decode seems to be the worst and where it usually crashes
Oh yeah it also likes to make my PC randomly reset even though my system is otherwise completely stable even running LLMs :D
@Cykyrios You can use my Repo to install Stable diffusion on ROCm RX 7900 XT it solves AMD ROCm RDNA2 & 3 problems with docker containers on linux https://github.com/hqnicolas/StableDiffusionROCm it was stable at 1.9.3 (latest) if you like this automation REPO please let a Star on it ⭐
@Cykyrios You can use my Repo to install Stable diffusion on ROCm RX 7900 XT it solves AMD ROCm RDNA2 & 3 problems with docker containers on linux https://github.com/hqnicolas/StableDiffusionROCm it was stable at 1.9.3 (latest) if you like this automation REPO please let a Star on it ⭐
Hey, out of curiosity does this use ROCm 6.1? There is some fixes that might have came to my GPU in that release, been meaning to test them out on Ubuntu manually updated but didn't get the chance
Hey, out of curiosity does this use ROCm 6.1? There is some fixes that might have came to my GPU in that release, been meaning to test them out on Ubuntu manually updated but didn't get the chance
@Beinsezii wanna make this repo an AMD page for stable diffusion I think this rocm fixes will make difference too
You will need to change the driver installation bash script from 6.0 to 6.1 And I think you will need to change the pytorch from rocm 5.6 to 6.1 too
Hey, out of curiosity does this use ROCm 6.1? There is some fixes that might have came to my GPU in that release, been meaning to test them out on Ubuntu manually updated but didn't get the chance
@Beinsezii wanna make this repo an AMD page for stable diffusion I think this rocm fixes will make difference too
You will need to change the driver installation bash script from 6.0 to 6.1 And I think you will need to change the pytorch from rocm 5.6 to 6.1 too
This line I would assume? ENV TORCH_COMMAND="pip install torch==2.1.2+rocm5.6 torchvision==0.16.2+rocm5.6 --extra-index-url https://download.pytorch.org/whl/rocm5.6"
This line I would assume?
ENV TORCH_COMMAND="pip install torch==2.1.2+rocm5.6 torchvision==0.16.2+rocm5.6 --extra-index-url https://download.pytorch.org/whl/rocm5.6"
Yes, you will need to test the compatibility with the installation image Because it's 3 layer of ROCm 1 - Baremetal driver ROCm 6.0 2 - Docker driver already ROCm 6.1 3 - Python torch plugin ROCm 5.6
I think you can change the Baremetal driver first and measure the results