Fooocus icon indicating copy to clipboard operation
Fooocus copied to clipboard

Please fix AMD GPU mem allocation issue.

Open xjbar opened this issue 1 year ago • 23 comments

There seems to be a memory loop issue causing the application to crash when trying to render images. This is a major issue and would like to know if it is going to be addressed or not. Just curious where it is on the kanban board :D

xjbar avatar Dec 09 '23 03:12 xjbar

+1

OycheD avatar Dec 09 '23 10:12 OycheD

+1

pythonmaster9000 avatar Dec 09 '23 10:12 pythonmaster9000

+1

AlexeyJersey avatar Dec 09 '23 11:12 AlexeyJersey

+1

stainz2004 avatar Dec 09 '23 20:12 stainz2004

+1

ferencsimon415 avatar Dec 09 '23 22:12 ferencsimon415

+1

TheRexo avatar Dec 09 '23 22:12 TheRexo

+1

heltonteixeira avatar Dec 10 '23 19:12 heltonteixeira

politely +1

I appreciate being able to use your software; and I would be happy to provide any logs or exceptions needed to help the Devs on this project.

I have 32GB of memory, an 8GB Radeon 6650, and an AMD 7950. I have tried with switches such as --lowvram which yields an exception stating I did not compile for CUDA cores; and I have tried some of the other suggested fixes which all appear to result in the system first allocating 100% of available GPU memory and then not using it while crashing when it needed roughly 65MB of GPU memory =(

Let me know if I can provide any other details.

grendahl06 avatar Dec 10 '23 20:12 grendahl06

+1 highly appreciating your work.

cytrixme avatar Dec 12 '23 11:12 cytrixme

A week ago I have installed Fooocus on a Manjaro linux on a laptop (AMD Ryzen 6900HS 32GB RAM, AMD 6800S 8GB VRAM). Everything run almost without HIP Error using any SDXL model I have tested, with any option and with up to 4 LORAs in advanced mode. So far, I have only been able to get a memory error with an "Upscale (2x)".

I have installed Fooocus cloning the Github repo :

git clone https://github.com/lllyasviel/Fooocus.git

Created python environment :

python -m venv venv
source venv/bin/activate

Upgraded pip to the latest version (probably not necessary) :

pip install --upgrade pip

Installed PyTorch nightly with ROCm 5.7 (see "Install Pytorch" paragraph on https://pytorch.org/)

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.7

installed the requirements :

pip install -r requirements_versions.txt

created a file "webui.sh" with the content below :

#!/bin/sh
source venv/bin/activate
HSA_OVERRIDE_GFX_VERSION=10.3.0 python entry_with_update.py --preset realistic 

made it executable and run it :

chmod +x webui.sh
./webui.sh

The HSA_OVERRIDE_GFX_VERSION seems to be the most important configuration option. If I remember correctly 10.3.0 should work with RDNA2 cards while 11.0.0 with RDNA3 cards.

However yesterday, after upgrading Fooocus :

git pull 

I had this error :

ERROR: Cannot install -r requirements_versions.txt (line 1), -r requirements_versions.txt (line 12), -r requirements_versions.txt (line 14), -r 
requirements_versions.txt (line 16), -r requirements_versions.txt (line 18), -r requirements_versions.txt (line 3), -r requirements_versions.txt (line 5),
-r requirements_versions.txt (line 8) and numpy==1.23.5 because these package versions have conflicting dependencies.

The conflict is caused by:
	The user requested numpy==1.23.5
	torchsde 0.2.5 depends on numpy>=1.19.*; python_version >= "3.7"
	transformers 4.30.2 depends on numpy>=1.17
	accelerate 0.21.0 depends on numpy>=1.17
	scipy 1.9.3 depends on numpy<1.26.0 and >=1.18.5
	pytorch-lightning 1.9.4 depends on numpy>=1.17.2
	gradio 3.41.2 depends on numpy~=1.0
	opencv-contrib-python 4.8.0.74 depends on numpy>=1.21.2; python_version >= "3.10"
	opencv-contrib-python 4.8.0.74 depends on numpy>=1.23.5; python_version >= "3.11"
	opencv-contrib-python 4.8.0.74 depends on numpy>=1.17.0; python_version >= "3.7"
	opencv-contrib-python 4.8.0.74 depends on numpy>=1.17.3; python_version >= "3.8"
	opencv-contrib-python 4.8.0.74 depends on numpy>=1.19.3; python_version >= "3.9"
	onnxruntime 1.16.3 depends on numpy>=1.24.2

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

replacing numpy==1.23.5 by numpy==1.24.2 in requirements_versions.txt and installing it fixes the problem and everything runs fine again, but I am not sure this is the way to do it.

psadac avatar Dec 15 '23 22:12 psadac

"onnxruntime 1.16.3 depends on numpy>=1.24.2" means you are using python 3.11 3.10 will not have this problem

lllyasviel avatar Dec 15 '23 22:12 lllyasviel

Yes, you're right, I am using Python 3.11. However when I installed Foocus a week ago I didn't have any error, I may just have been lucky.

psadac avatar Dec 16 '23 18:12 psadac

I've finally got it to render with ver. 2.1.860 using my 6700 XT (12GB VRAM), however the VRAM is still being detected as 1024MB only, and thus very very slow renders. Task Manager and AMD Overlay shows full 12GB GPU utilisation though...

My run.bat looks like this, which includes the --attention-split code as suggested in the script when run.bat is running. Any ideas?

.\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y
.\python_embeded\python.exe -m pip install torch-directml
.\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml --preset realistic --attention-split
pause

Screenshot 2024-01-06 132158 Screenshot 2024-01-06 132223 Screenshot 2024-01-06 132658

magicAUS avatar Jan 06 '24 05:01 magicAUS

I've finally got it to render with ver. 2.1.860 using my 6700 XT (12GB VRAM), however the VRAM is still being detected as 1024MB only, and thus very very slow renders. Task Manager and AMD Overlay shows full 12GB GPU utilisation though...

My run.bat looks like this, which includes the --attention-split code as suggested in the script when run.bat is running. Any ideas?

.\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y
.\python_embeded\python.exe -m pip install torch-directml
.\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml --preset realistic --attention-split
pause

Screenshot 2024-01-06 132158 Screenshot 2024-01-06 132223 Screenshot 2024-01-06 132658

I am experiencing the exact same issue. My GPU is an AMD 7800XT. I have done everything stated in the quoted post. Not sure if I am missing something entirely or just simply doing something wrong. Any insight would be greatly appreciated. Granted, this does not stop the program from running, it is just slower than expected.

oXb3 avatar Feb 04 '24 02:02 oXb3

@xjbar currently doing issue cleanup. Is this issue still present for you using the latest version of Fooocus or can it be closed?

mashb1t avatar Feb 22 '24 22:02 mashb1t

@xjbar currently doing issue cleanup. Is this issue still present for you using the latest version of Fooocus or can it be closed?

@mashb1t - mine and @oXb3 's issue still present in latest version (2.1.865) fyi

magicAUS avatar Feb 23 '24 02:02 magicAUS

Runing here without no problem https://gist.github.com/hqnicolas/5fbb9c37dcfc29c9a0ffe50fbcb35bdd to RX6000 use: HSA_OVERRIDE_GFX_VERSION=10.3.0

hqnicolas avatar Mar 10 '24 01:03 hqnicolas

Runing here without no problem https://gist.github.com/hqnicolas/5fbb9c37dcfc29c9a0ffe50fbcb35bdd to RX6000 use: HSA_OVERRIDE_GFX_VERSION=10.3.0

@hqnicolas does everything on that URL go into run.bat?

magicAUS avatar Mar 16 '24 15:03 magicAUS

@magicAUS you need to: clean install ubuntu 22.04 copy and paste every step manually to the terminal first you need to read the blue title that says 1 - Driver install 2 - Before Run 3 - Run it

hqnicolas avatar Mar 18 '24 11:03 hqnicolas

@magicAUS you need to: clean install ubuntu 22.04 copy and paste every step manually to the terminal first you need to read the blue title that says 1 - Driver install 2 - Before Run 3 - Run it

@hqnicolas I think the OS is the differentiator to it working. @oXb3 and I are on Windows (11 Pro for me).

magicAUS avatar Mar 18 '24 11:03 magicAUS

@magicAUS insert an extra SSD on your machine and build it

hqnicolas avatar Mar 18 '24 11:03 hqnicolas

I am facing the exact same issue on Windows. I am running on an RX7800xt, with 32gb of RAM. Fooocus only recognizes 1024MB of vRam and when starting to generate the models it throws the following:

Fooocus\modules\anisotropic.py:132: UserWarning: The operator 'aten::std_mean.correction' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at C:__w\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.) s, m = torch.std_mean(g, dim=(1, 2, 3), keepdim=True) 3%|██▊ | 1/30 [00:07<03:23, 7.01s/it][W dml_heap_allocator.cc:120] DML allocator out of memory!

Any solution to it? I've been searching for one but no success so far.

Thank you!

mathshenry avatar Jun 21 '24 00:06 mathshenry

Fooocus really was cool. Please understand I'm not insulting their work.

I have an RX 6650, and I found that SD.Next with the zluda pipeline works best for me.

Most of the memory issues seem to be a Microsoft and AMD bug, but somehow, the zluda stuff makes it work pretty well

I was getting 15 minutes with cpu per image, and now I get 1-2 minutes per image now that I'm running on GPU

I hope that helps a little.

On Thu, Jun 20, 2024, 8:07 PM Matheus Henrique de Oliveira < @.***> wrote:

I am facing the exact same issue on Windows. I am running on an RX7800xt, with 32gb of RAM. Fooocus only recognizes 1024MB of vRam and when starting to generate the models it throws the following:

Fooocus\modules\anisotropic.py:132: UserWarning: The operator 'aten::std_mean.correction' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at C:__w\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.) s, m = torch.std_mean(g, dim=(1, 2, 3), keepdim=True) 3%|██▊ | 1/30 [00:07<03:23, 7.01s/it][W dml_heap_allocator.cc:120] DML allocator out of memory!

Any solution to it? I've been searching for one but no success so far.

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/lllyasviel/Fooocus/issues/1294#issuecomment-2181753097, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGXUTLLDTCTMJJ5LYPIZVVDZINVEHAVCNFSM6AAAAABANPQIUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBRG42TGMBZG4 . You are receiving this because you commented.Message ID: @.***>

grendahl06 avatar Jun 21 '24 00:06 grendahl06