stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

[Issue]: Torch is not able to use GPU

Open aber007 opened this issue 1 year ago • 28 comments

Checklist

  • [X] The issue exists after disabling all extensions
  • [ ] The issue exists on a clean installation of webui
  • [ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
  • [ ] The issue exists in the current version of the webui
  • [ ] The issue has not been reported before recently
  • [ ] The issue has been reported before but has not been fixed yet

What happened?

Test from Scripts Collecting environment information... C:\Users\PC\stable-diffusion-webui-directml\venv\lib\site-packages\torch\cuda_init_.py:107: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ..\c10\cuda\CUDAFunctions.cpp:109.) return torch._C._cuda_getDeviceCount() > 0 PyTorch version: 2.0.1+cu118 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Home GCC version: Could not collect Clang version: Could not collect CMake version: Could not collect Libc version: N/A

Python version: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.22621-SP0 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

CPU: Architecture=9 CurrentClockSpeed=2500 DeviceID=CPU0 Family=205 L2CacheSize=11776 L2CacheSpeed= Manufacturer=GenuineIntel MaxClockSpeed=2500 Name=13th Gen Intel(R) Core(TM) i5-13500 ProcessorType=3 Revision=

Versions of relevant libraries: [pip3] numpy==1.23.5 [pip3] open-clip-torch==2.20.0 [pip3] pytorch-lightning==1.9.4 [pip3] torch==2.0.1+cu118 [pip3] torchdiffeq==0.2.3 [pip3] torchmetrics==1.2.1 [pip3] torchsde==0.2.6 [pip3] torchvision==0.15.2+cu118 [conda] Could not collect

Using a AMD 7800xt and have followed the amd installation.

Steps to reproduce the problem

Starting the UI

What should have happened?

.

What browsers do you use to access the UI ?

No response

Sysinfo

.

Console logs

NVIDIA driver was found.
fatal: No names found, cannot describe anything.
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: 1.7.0
Commit hash: cfa6e40e6d7e290b52940253bf705f282477b890
Traceback (most recent call last):
  File "C:\Users\PC\stable-diffusion-webui-directml\launch.py", line 48, in <module>
    main()
  File "C:\Users\PC\stable-diffusion-webui-directml\launch.py", line 39, in main
    prepare_environment()
  File "C:\Users\PC\stable-diffusion-webui-directml\modules\launch_utils.py", line 560, in prepare_environment
    raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
Press any key to continue . . .

Additional information

No response

aber007 avatar Dec 29 '23 19:12 aber007

Got the 7800 xt aswell and get the same error on win10 and ubuntu, still got no solution. I hope this gets fixed by amd.

pizzapizze avatar Dec 30 '23 00:12 pizzapizze

You can fix this by doing the following to your installation after upgrading to 1.7.0 with AMD cards:

  1. Go to requirements_versions.txt and change line 29 where it says torch to torch-directml
  2. Go to your webui-user.bat/sh and add the following: set COMMANDLINE_ARGS=--use-directml --reinstall-torch

Using these steps A) sets python to use the DirectML version of Torch and B) redownloads it so it works again. You can delete the --reinstall-torch line after the first try. Its not needed on subsequent starts of WebUI.

It seems that there is now a specific version of torch for DirectML that wasn't used before so you have to manually adapt the files to make it work. Its a bit strange but it gets your old install working again.

Not: I don't take credit for this, its something I got of another post

nuclear314 avatar Dec 30 '23 00:12 nuclear314

Got the 7800 xt aswell and get the same error on win10 and ubuntu, still got no solution. I hope this gets fixed by amd.

there’s no way, since the error isn’t from AMD. You just need to download the correct version for Torch.

zakusworo avatar Dec 30 '23 00:12 zakusworo

issues with the directml fork should be reported directly to the forks repo https://github.com/lshqqytiger/stable-diffusion-webui-directml

w-e-w avatar Dec 30 '23 08:12 w-e-w

You can fix this by doing the following to your installation after upgrading to 1.7.0 with AMD cards:

  1. Go to requirements_versions.txt and change line 29 where it says torch to torch-directml
  2. Go to your webui-user.bat/sh and add the following: set COMMANDLINE_ARGS=--use-directml --reinstall-torch

Using these steps A) sets python to use the DirectML version of Torch and B) redownloads it so it works again. You can delete the --reinstall-torch line after the first try. Its not needed on subsequent starts of WebUI.

It seems that there is now a specific version of torch for DirectML that wasn't used before so you have to manually adapt the files to make it work. Its a bit strange but it gets your old install working again.

Not: I don't take credit for this, its something I got of another post @nuclear314

After days of searching, reinstalling the venv, trying different versions of torch, and reinstalling stable diffusion all together, nothing was working. This is the only thing that worked for me.

I can say exactly when things started going wrong, I was producing images just fine with my 6700xt (12gb), but getting an occasional "Could not allocate tensor with [x amount] bytes" whenever my prompt was too long or trying a resolution that was too high. That led me to trying different COMMANDLINE_ARGS from this thread, and I got to someone who suggested doing these in CMD prompt: git pull to ensure latest update pip install -r requirements.txt It was from here when I started to get the RuntimeError whenever launching stable diffusion.

thisisnotreal459 avatar Dec 31 '23 12:12 thisisnotreal459

issues with the directml fork should be reported directly to the forks repo https://github.com/lshqqytiger/stable-diffusion-webui-directml

The problem is the plugin is not getting rocm not the use of directml, the amd guide said the stable-diffusion-webui-directml is for amd+windows systems, for amd+linux systems we could use this plugin (I guess). The stable-diffusion-webui should use rocm instead directml, the problem is it doesn't.

Gonzalo1987 avatar Jan 04 '24 22:01 Gonzalo1987

2. =--use-directml

This is not viewed as a valid argument for me.

jaminW55 avatar Jan 27 '24 05:01 jaminW55

issues with the directml fork should be reported directly to the forks repo https://github.com/lshqqytiger/stable-diffusion-webui-directml

The problem is the plugin is not getting rocm not the use of directml, the amd guide said the stable-diffusion-webui-directml is for amd+windows systems, for amd+linux systems we could use this plugin (I guess). The stable-diffusion-webui should use rocm instead directml, the problem is it doesn't.

ROCm and DirectML is different backend. because of ROCm is not fully supported on Windows, you can use WSL2 to run it. Unless, you prefer use directml with it's trade off to AMD GPU

zakusworo avatar Jan 27 '24 08:01 zakusworo

  1. =--use-directml

This is not viewed as a valid argument for me.

at first, you need to pull from SD Webui directml repo, not from this (native) A1111 SD Webui repo

zakusworo avatar Jan 27 '24 08:01 zakusworo

you can use WSL2 to run it. Unless, you prefer use directml with it's trade off to AMD GPU

Maybe i'm wrong, but i believe it's not possible to use ROCm on WSL2.

DGdev91 avatar Feb 15 '24 23:02 DGdev91

Ok, let's try to make some order...

You have 2 options: OPTION 1 - Stay on windows and use the DirectML version (slower, but still one of the best way to run SD on windows + amd right now). your best option it's use this fork: https://github.com/lshqqytiger/stable-diffusion-webui-directml

then follow those instructions https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs

OPTION 2 - Use Linux + ROCm On Linux you can use ROCm, wich isn't yet fully available on windows (on windows there's only the HIP sdk right now, not enough for PyTorch). If you are new to Linux i suggest you Ubuntu or any of his derivates, like Kubuntu. Linux Mint should be fine too.

You'll have to install ROCm, follow the official instructions: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html On Ubuntu and derivates i suggest you to use the "AMDGPU Installer" method

Then, just follow the instrunctions for the webui on AMD https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs

DGdev91 avatar Feb 16 '24 00:02 DGdev91

Ok, let's try to make some order...

You have 2 options: OPTION 1 - Stay on windows and use the DirectML version (slower, but still one of the best way to run SD on windows + amd right now). your best option it's use this fork: https://github.com/lshqqytiger/stable-diffusion-webui-directml

then follow those instructions https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs

OPTION 2 - Use Linux + ROCm On Linux you can use ROCm, wich isn't yet fully available on windows (on windows there's only the HIP sdk right now, not enough for PyTorch). If you are new to Linux i suggest you Ubuntu or any of his derivates, like Kubuntu. Linux Mint should be fine too.

You'll have to install ROCm, follow the official instructions: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html On Ubuntu and derivates i suggest you to use the "AMDGPU Installer" method

Then, just follow the instrunctions for the webui on AMD https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs

I'm using Linux OS, but Do I need to manually install the ROCm package? It is too huge!

Hackgets avatar Feb 16 '24 10:02 Hackgets

I'm using Linux OS, but Do I need to manually install the ROCm package? It is too huge!

Not manually, you have to install it following the instructions for your distribution. There are plenty of tools to do that in a straightforward way, like the AMDGPU Installer i mentioned.

And yes, you need the full suite, around 20Gb on disk.

Then you'll need another 10+ Gb for pytorch and all the dependencies (automatically indtalled when you first launch the webui) and several more Gbs for each StableDiffusion model

I not remember how much space it requires CUDA for Nvidia GPUs, but i'm sure that's pretty big too

DGdev91 avatar Feb 16 '24 12:02 DGdev91

So I've been trying to figure out how to get this going correctly using ROCm with my RX 7800XT on a new Manjaro install.

One HUGE problem I'm seeing in almost all the latest OS builds (Ubuntu too for example) is they've changed Python over to a new setup that uses system packages and not everything can even be installed that way. Plus all this stuff uses the ancient Python 3.10 from 2021 instead of Python 3.12 from last year -- and everything ships with Python 3.12 as the minimum now. So far the best way I've been able to get a lot of things working (and this issue isn't even specific to AMD -- I'm having to do it on a separate computer with a RTX 3060Ti) is to use the older 3.10 version of Miniconda to use a separate user environment for Python 3.10. Using 3.12 -- even if you use the break system packages option -- causes a lot of conflicts and some things refuse to install at all.

Once I solved that I still can't get it to a point it will use ROCm. By all rights it should. I have a "newer" GPU (7800XT has been out for almost a full year now) with official ROCm support and AI cores. As far as I know I have the proper pytorch installed in the venv and the user miniconda (yeah, wasting 2x the space.) From what I saw on the webui.sh file I wonder if it just has a short whitelist that doesn't include "newer" cores like the 7800XT and 7900XT and so it just simply refuses to continue on incorrect principle.

If there's anything else I'm missing, I sure would like to know. Whatever the case may be, the "automatic install" definitely needs to be updated to work with newer systems where Python 3.10 isn't even a native option.

Nazosan avatar Jun 02 '24 13:06 Nazosan

Plus all this stuff uses the ancient Python 3.10 from 2021 instead of Python 3.12 from last year -- and everything ships with Python 3.12 as the minimum now. So far the best way I've been able to get a lot of things working (and this issue isn't even specific to AMD -- I'm having to do it on a separate computer with a RTX 3060Ti) is to use the older 3.10 version of Miniconda to use a separate user environment for Python 3.10.

Yes, that's why almost every AI tool reccomends using a Conda enviroment. ....I think it can be a good idea to suggest the use of Anaconda/Miniconda for every installation, maybe we can open up an issue for that. Now that Python 3.10 is getting old, it's becoming almost mandatory.

Once I solved that I still can't get it to a point it will use ROCm. By all rights it should. I have a "newer" GPU (7800XT has been out for almost a full year now) with official ROCm support and AI cores. As far as I know I have the proper pytorch installed in the venv and the user miniconda (yeah, wasting 2x the space.) From what I saw on the webui.sh file I wonder if it just has a short whitelist that doesn't include "newer" cores like the 7800XT and 7900XT and so it just simply refuses to continue on incorrect principle.

Not really, ROCm officially supports only the 7900XT and XTX, on the "consumer" side. Every other relatively newer GPU should work too, but isn't officially supported.

In the webui.sh there are some workarounds for older cards, newer cards like the 7900xt and 7900xtx just work without them.

In your case, most likely you need to add this to your webui-user.sh:

HSA_OVERRIDE_GFX_VERSION=11.0.0

Let me know if that works... If that works definitive solution would be adding that flag for every RX6000 and RX7000 different than the 7900XT and XTX.

If there's anything else I'm missing, I sure would like to know. Whatever the case may be, the "automatic install" definitely needs to be updated to work with newer systems where Python 3.10 isn't even a native option.

That would't be possibile right now. Sadly, many dependencies can't work on python 3.12.

... probably 3.11 can work too, but last time i checked with 3.12 many packages couldn't be installed.

DGdev91 avatar Jun 03 '24 09:06 DGdev91

Yes, that's why almost every AI tool reccomends using a Conda enviroment. ....I think it can be a good idea to suggest the use of Anaconda/Miniconda for every installation, maybe we can open up an issue for that. Now that Python 3.10 is getting old, it's becoming almost mandatory. I'm using Miniconda3 with Python 3.10. The fact is, any large modern Python project is a dependency hell. I found out the hard way (because of an insightface error on a different SD system) that I had to manually install pydantic==2.7.1 and fastapi==0.111.0 in the venv even on the CUDA system.

This is reaching a point it's unsustainable and not really suitable for your average user. The list of instructions to install even in Windows is growing, but, more importantly, the number of packages becoming more and more incompatible with other packages is growing with issues like the above where things have to be one very specific version or it fails. Eventually that one version won't work with something else and that will be it. Python 3.12 has been out quite some time now and even LTS systems are shipping with it now, so it really is time to start supporting it.

That said, I'm just saying that as an aside. It's going to become a major problem at some point using an old version of Python not available in many systems directly, especially as third party solutions like Anaconda/Miniconda. The new system a bunch are using where pip is effectively disabled is especially going to become troublesome. But all that said, I'm using Miniconda with Python 3.10. I just think the average joe is going to be very confused by all this. (Heck, I was pretty confused at first when I first came across instructions for setting it up and I consider myself more of a power user.) Well, Python is a hot mess no matter what you use and I get that. I just think it should be somewhere on the backburner to get this thing running on 3.12 as that is now effectively LTS for all intents and purposes.

Not really, ROCm officially supports only the 7900XT and XTX, on the "consumer" side. Every other relatively newer GPU should work too, but isn't officially supported.

ROCm supports the 7800XT and 7700XT officially in the official lists made by AMD (whom I consider an authoritative source on ROCm.) I did run across an old article that said Stable Diffusion was using ROCm 5.2 at that time, but I assumed it must surely be higher since then, right? ROCm release 5.6.0 added ROCm 5.5 support for GFX1101 (Navi32) -- aka the 7800XT (yeah, that's confusing. Release 5.6.0 makes it work on things that use 5.5, so I guess that means it may not work if something is using 5.2.) The current ROCm version for Windows is 5.7 and Linux is on 6.1 with 6.2 probably just around the corner. If it's still using 5.2 it's time for an update as this greatly affects hardware support and since even Windows has 5.7 there is no reason to hold back further than that (though ideally it really should support at least 6.0 in Linux as 6.0 is already in a number of older LTS distros without doing a major upgrade.)

I would like to add here that other inferencing things such as llama.cpp Just Works(tm). I build the HIP version and presto, it works. No manual overrides, trying to trick it into thinking I have a 7900XT, or etc. KoboldCPP_ROCm (uses llama.cpp) also Just Works(tm) in Windows without special overrides or anything (I am not messing with the toolkit stuff to manually build anything in Windows.) Of course, that's using C++ for most functions and just has a simple bit of Python for an interface rather than anything to do with accelerating.

In the webui.sh there are some workarounds for older cards, newer cards like the 7900xt and 7900xtx just work without them.

In your case, most likely you need to add this to your webui-user.sh:

HSA_OVERRIDE_GFX_VERSION=11.0.0

Of course I did do that. That's listed multiple times across multiple places as a supposed solution. It just says the usual: RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check My guess is it's not installing the correct torch, but it's kind of a pain to even figure out how to manually override this since some packages are using manually entered URLs to pre-built wheels and all for some stuff and Python has elected to disable any ability for pip to allow the user to search packages in a normal fashion (the heck???)

And inbefore, no, I do not want to use DirectML on a GPU that has decent AI cores that I have verified work great in other things.

As a sidenote, interestingly HSA_OVERRIDE_GFX_VERSION=11.0.0 does appear to kind of work with stable-diffusion-webui-forge, but that is a whole different beast with its own issues for me. (The biggest of which applies to both my nVidia and my AMD systems, which sucks because it's FAST.)

Nazosan avatar Jun 03 '24 22:06 Nazosan

Yes, that's why almost every AI tool reccomends using a Conda enviroment. ....I think it can be a good idea to suggest the use of Anaconda/Miniconda for every installation, maybe we can open up an issue for that. Now that Python 3.10 is getting old, it's becoming almost mandatory. I'm using Miniconda3 with Python 3.10. The fact is, any large modern Python project is a dependency hell. I found out the hard way (because of an insightface error on a different SD system) that I had to manually install pydantic==2.7.1 and fastapi==0.111.0 in the venv even on the CUDA system.

This is reaching a point it's unsustainable and not really suitable for your average user. The list of instructions to install even in Windows is growing, but, more importantly, the number of packages becoming more and more incompatible with other packages is growing with issues like the above where things have to be one very specific version or it fails. Eventually that one version won't work with something else and that will be it. Python 3.12 has been out quite some time now and even LTS systems are shipping with it now, so it really is time to start supporting it.

That said, I'm just saying that as an aside. It's going to become a major problem at some point using an old version of Python not available in many systems directly, especially as third party solutions like Anaconda/Miniconda. The new system a bunch are using where pip is effectively disabled is especially going to become troublesome. But all that said, I'm using Miniconda with Python 3.10. I just think the average joe is going to be very confused by all this. (Heck, I was pretty confused at first when I first came across instructions for setting it up and I consider myself more of a power user.) Well, Python is a hot mess no matter what you use and I get that. I just think it should be somewhere on the backburner to get this thing running on 3.12 as that is now effectively LTS for all intents and purposes.

Well, it wasn't never really intended for the average user. All those stuff are mostly intended for developers, or at least "geeky" people. I think that for SD to really become something on everybody's reach it needs to change approach. StableDiffusion.cpp for example is a promising project and has way less problems with dependencies, but still has some major issues (for example, not being able to generate something at higher resolutions even if there's still plenty of vram) I'm trying to make an interface for it in the free time, but isn't ready yet

Anyway, all this dependency mess isn't really something that this project can fix. The problems in some libraries wich are still actively used but are not getting upgraded.

... Maybe there's a way to make it work on 3.12 now that PyTorch supports it, but last time i checked it still had some issues woth some packages.

Anyway, this is one of the reasons why conda was made. Probably the right way to hamdle this is just chamging a bit the docs to make users use it. (Many people are already using it anyway)

Not really, ROCm officially supports only the 7900XT and XTX, on the "consumer" side. Every other relatively newer GPU should work too, but isn't officially supported.

ROCm supports the 7800XT and 7700XT officially in the official lists made by AMD (whom I consider an authoritative source on ROCm.) I did run across an old article that said Stable Diffusion was using ROCm 5.2 at that time, but I assumed it must surely be higher since then, right? ROCm release 5.6.0 added ROCm 5.5 support for GFX1101 (Navi32) -- aka the 7800XT (yeah, that's confusing. Release 5.6.0 makes it work on things that use 5.5, so I guess that means it may not work if something is using 5.2.) The current ROCm version for Windows is 5.7 and Linux is on 6.1 with 6.2 probably just around the corner. If it's still using 5.2 it's time for an update as this greatly affects hardware support and since even Windows has 5.7 there is no reason to hold back further than that (though ideally it really should support at least 6.0 in Linux as 6.0 is already in a number of older LTS distros without doing a major upgrade.)

Errr... Not really. For linux, the official docs mention the 7900XT and 7900XTX only. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html

For Windows, there's official support also for your card https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html

But that's a bit different... For now Windows supports HIP but not MlOpen/Pytorch, used by StableDiffusion. That's why SD for Windows uses DirectML, wich is much slower.

Anyway... You should probably use the last versions on Rocm and PyTorch. Current syable version needs Rocm 6.

I would like to add here that other inferencing things such as llama.cpp Just Works(tm). I build the HIP version and presto, it works. No manual overrides, trying to trick it into thinking I have a 7900XT, or etc. KoboldCPP_ROCm (uses llama.cpp) also Just Works(tm) in Windows without special overrides or anything (I am not messing with the toolkit stuff to manually build anything in Windows.) Of course, that's using C++ for most functions and just has a simple bit of Python for an interface rather than anything to do with accelerating.

Because those aren't using PyTorch and MlOpen. They all rely on HIPblas, wich is still a ROCm component but more widely supported, and there's on Windows too.

The StableDiffusion.cpp i mentioned before use that too.

HSA_OVERRIDE_GFX_VERSION=11.0.0

Of course I did do that. That's listed multiple times across multiple places as a supposed solution. It just says the usual: RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check My guess is it's not installing the correct torch, but it's kind of a pain to even figure out how to manually override this since some packages are using manually entered URLs to pre-built wheels and all for some stuff and Python has elected to disable any ability for pip to allow the user to search packages in a normal fashion (the heck???)

And inbefore, no, I do not want to use DirectML on a GPU that has decent AI cores that I have verified work great in other things.

Sorry, it was "export HSA_OVERRIDE_GFX_VERSION=11.0.0", not just "HSA_OVERRIDE_GFX_VERSION=11.0.0". Maybe it doesn't change anything, but it's worth trying.

Also, for making sure you have the right pytorch version:

First open your venv enviroment. You don't need to activate conda if the venv was made within the conda enviroment in the first place. "source venv/bin/activate" Then, do this "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0 --upgrade --force-reinstall" (torchaudio isn't really needed unless you use stuff like Riffusion, but it's very small, i'll it there just in case)

It can also be useful to uncomment the TORCH_COMMAND variable in webui_user.sh, to prevent it from re-installing the wrong torch version.

TORCH_COMMAND="pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0"

DGdev91 avatar Jun 04 '24 07:06 DGdev91

Errr... Not really. For linux, the official docs mention the 7900XT and 7900XTX only. https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html

7800XT is RDNA3. The ROCm 5.6 release explicitly mentions adding it. rocminfo lists it as a supported device. Programs like llama.cpp which use it for acceleration when searching for "ROCm devices" list it as an available ROCm device with compute capabilities and thus it doesn't need special overrides or etc as with pre-ROCm devices. Other things that can use compute acceleration such as image resizers recognize it as a ROCm device and utilize it for acceleration. GFX1101 is explicitly mentioned in the table of devices. The official ROCm release itself physically installed on my system returns "GFX1101" rather than "error" when I run /opt/rocm/llvm/bin/amdgpu-arch to query devices (meaning it's a known device to the ROCm library itself.) The 7800XT specifically advertises having compute/AI cores. Its core is effectively just an updated but slightly slower version of the same chip as the 7900XT just with fewer compute cores, smaller RAMBUS, etc, but same basic capabilities.

I think at this point we can move on from "is this device an official ROCm device even though it isn't mentioned in one specific document?" to "this is obviously a ROCm device, so why isn't this one specific program recognizing it as such when everything else does?" and to be fair "this one specific program" is an old Python package, so somehow I suspect we can apply Occam's Razor here and assume culprit is more likely to be the outdated Python package rather than working under the assumption that AMD has very carefully blacklisted this in specific functions that would confuse only Stable Diffusion.

Sorry, it was "export HSA_OVERRIDE_GFX_VERSION=11.0.0", not just "HSA_OVERRIDE_GFX_VERSION=11.0.0". Maybe it doesn't change anything, but it's worth trying.

Yes, it is export in the webui-user.sh file, just as per the previous instructions. I know the variable wouldn't carry over into the environment beyond that one script without the export.

As a side note, GFX1101 is actually compute 11.0.3, but changing to that doesn't affect anything either. (Just thought I'd try it.)

Then, do this "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0 --upgrade --force-reinstall" (torchaudio isn't really needed unless you use stuff like Riffusion, but it's very small, i'll it there just in case)

It can also be useful to uncomment the TORCH_COMMAND variable in webui_user.sh, to prevent it from re-installing the wrong torch version.

TORCH_COMMAND="pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0"

I think we're getting closer here. Although shouldn't there be a ROCm 6.1 version? I thought I saw one listed in some older instructions that had us manually putting in URLs in the past, but changing that from 6.0 to 6.1 in that URL doesn't work. Not sure if it matters. Anyway, it must have indeed not been using the correct torch before because this has gotten me further along. Along with the above override it will now at least get further along, but now I get this: rocBLAS error: Cannot read /downloads/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1101

Out of curiosity I copied TensileLibrary_lazy_gfx1100.dat over to TensileLibrary_lazy_gfx1101.dat, but this seems to have not worked, which is likely no surprise. Those are binary files, so chances are it needs to be specifically compiled for the correct device and not simply copied over. But I tried it anyway. Now it starts, but things like trying to load a model produce an error with a traceback that states:

Exception in thread Thread-2 (load_model):                                                                                                                      
Traceback (most recent call last):                                              
  File "/home/nazo/miniconda3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner                                                                   
    self.run()                                                                                                                                                  
  File "/home/nazo/miniconda3.10/lib/python3.10/threading.py", line 953, in run                                                                                 
    self._target(*self._args, **self._kwargs)                             
  File "/downloads/stable-diffusion-webui/modules/initialize.py", line 154, in load_model                   
    devices.first_time_calculation()                                                                                                                            
  File "/downloads/stable-diffusion-webui/modules/devices.py", line 271, in first_time_calculation                                                              
    conv2d(x)                                                                                                                                                   
  File "/downloads/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl                       
    return self._call_impl(*args, **kwargs)                                                                                                                     
  File "/downloads/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)                                                                                                                        
  File "/downloads/stable-diffusion-webui/extensions-builtin/Lora/networks.py", line 518, in network_Conv2d_forward
    return originals.Conv2d_forward(self, input)                                                                                                                
  File "/downloads/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)                                                                                                    
  File "/downloads/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward                               
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: HIP error: invalid device function                          
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3.                   
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

Possibly some specific component of torch or torch itself needs to be manually compiled. I tried compiling PyTorch itself from https://github.com/pytorch/pytorch, but got an error in building.

Nazosan avatar Jun 04 '24 22:06 Nazosan

7800XT is RDNA3. The ROCm 5.6 release explicitly mentions adding it. rocminfo lists it as a supported device. Programs like llama.cpp which use it for acceleration when searching for "ROCm devices" list it as an available ROCm device with compute capabilities and thus it doesn't need special overrides or etc as with pre-ROCm devices. Other things that can use compute acceleration such as image resizers recognize it as a ROCm device and utilize it for acceleration. GFX1101 is explicitly mentioned in the table of devices. The official ROCm release itself physically installed on my system returns "GFX1101" rather than "error" when I run /opt/rocm/llvm/bin/amdgpu-arch to query devices (meaning it's a known device to the ROCm library itself.) The 7800XT specifically advertises having compute/AI cores. Its core is effectively just an updated but slightly slower version of the same chip as the 7900XT just with fewer compute cores, smaller RAMBUS, etc, but same basic capabilities.

I think at this point we can move on from "is this device an official ROCm device even though it isn't mentioned in one specific document?" to "this is obviously a ROCm device, so why isn't this one specific program recognizing it as such when everything else does?" and to be fair "this one specific program" is an old Python package, so somehow I suspect we can apply Occam's Razor here and assume culprit is more likely to be the outdated Python package rather than working under the assumption that AMD has very carefully blacklisted this in specific functions that would confuse only Stable Diffusion.

That's not what i said. There's indeed some support in ROCm's code for that gpu, and it can definitley work in some areas, but it still hasn't a FULL support for every feature, for example there isn't yet a full support on the tensile libraries for your architecture. That's why i reccomanded using "export HSA_OVERRIDE_GFX_VERSION=11.0.0". it should in theory try to use the libs for gfx1100 (rtx 7900/7900XT) ....And that's also why AMD included the 7800XT (and many other GPUs) in the supported list for the HIP SDK on Windows, but not the full ROCm on Linux.

Some tools like llama.cpp, koboldcpp and stablediffusion.cpp can work natively (even on windows) because they don't need those features. they relay only on HIPblas. while StableDiffusion uses PyToch, wich needs MlOpen and the Tensile libs.

...there was also a fix on the tensile repo for making it build by the default the libs also for the arch wich still don't have an optimized logic https://github.com/ROCm/Tensile/pull/1897

sadly, that fix isn't yet included in the packages for most of the distributions.

I think we're getting closer here. Although shouldn't there be a ROCm 6.1 version? I thought I saw one listed in some older instructions that had us manually putting in URLs in the past, but changing that from 6.0 to 6.1 in that URL doesn't work. Not sure if it matters. Anyway, it must have indeed not been using the correct torch before because this has gotten me further along. Along with the above override it will now at least get further along, but now I get this: rocBLAS error: Cannot read /downloads/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1101

No, it must be 6.0, because the current pytorch stable has been compiled againt 6.0 If you have 6.1 it's still fine, but the URL should be as i written.

There's also a pre-release version compiled against rocm 6.1, but it had a bug wich was making the webui crash at startup. it should be fixed now. didn't tried that. https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/15874

if you want to try, the pip command is: pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.1

.....it shouldn't really change anything from the stable version

The real issue here is that ROCm seems like is trying to load the tensile libs for gfx1101 evwen with the HSA_OVERRIDE.

Try this: run export HSA_OVERRIDE_GFX_VERSION=11.0.0 on the terminal and then on the same terminal launch the WebUI. let's see if it works this way...

DGdev91 avatar Jun 06 '24 01:06 DGdev91

Ok, so I just did a new full system installation. I needed to change drives and distros (went to the latest Manjaro.) Anyway, trying to be sure it would be done right, I did everything in a very deliberate manner. First, I installed Miniconda for 3.10 and installed the requirements in it first without a venv. Probably unnecessary and waste of space, but just to be thorough. Second I made sure to edit webui-user.sh to put in the corrected torch install command with the ROCm wheel before ever running webui.sh. Not sure which of these things was the key, but I'm thinking it might actually be installing the requirements in the non venv first before ever running it.

I did still have to have export HSA_OVERRIDE_GFX_VERSION=11.0.0 in the webui-user.sh to keep it from trying to load the GFX1101 file not included in that build of pytorch.

As a side note, I'm able to use --medvram to get decent resolutions on this GPU. My 3060Ti in another system requires --lowvram and still OOMs on high resolutions in Linux so I have to use Windows (I wish we could control the CUDA/HIP OOM vs swapping functionality -- SD would be better swapping even though it is slow.) --upcast-sampling seems to work fine instead of using --precision full --no-half.

For some reason I'm getting low quality images. Might be missing a VAE or something. (It's kind of like if they had extremely low numbers of steps -- like 10 or something rather than the 80 I usually find to be the best quality/balance.) I'll figure that out later. Meanwhile, at least we can consider the 7800XT working in Linux with the right setup. Guess the trick is just getting that right setup. Whatever that actually is.

Nazosan avatar Jun 06 '24 10:06 Nazosan

I did still have to have export HSA_OVERRIDE_GFX_VERSION=11.0.0 in the webui-user.sh to keep it from trying to load the GFX1101 file not included in that build of pytorch.

The problem is in ROCm, not PyTorch. But whatever....

As a side note, I'm able to use --medvram to get decent resolutions on this GPU. My 3060Ti in another system requires --lowvram and still OOMs on high resolutions in Linux so I have to use Windows (I wish we could control the CUDA/HIP OOM vs swapping functionality -- SD would be better swapping even though it is slow.) --upcast-sampling seems to work fine instead of using --precision full --no-half.

Great. In theory with 16gb vram should be fine in most cases even without --medvram on the 7800xt, but feel free to keep it if you are having OOMs.

I suggest also to use --opt-sub-quad-attention, it should be at bit more memory efficient. There's also --opt-sdp-attention wich is faster, but it gives more OOMs at higher resolutions.

For some reason I'm getting low quality images. Might be missing a VAE or something. (It's kind of like if they had extremely low numbers of steps -- like 10 or something rather than the 80 I usually find to be the best quality/balance.) I'll figure that out later. Meanwhile, at least we can consider the 7800XT working in Linux with the right setup. Guess the trick is just getting that right setup. Whatever that actually is.

80 is a bit too much, i usally never go over 50. But probably that isn't the issue.

Also, make sure the model you are using isn't based on something like SDXL-Turbo or SDXL-Lightning. Those models are optimized for working with less steps and need different settings.

I usually use 10 (or even less) steps and CFG scale set to 1 or 1.5 (instead of 7), and LCM as sampler.

You can also have the same effect on any SD1.5 or SDXL model by using the LCM lora (there's one for SD 1.5 and one for SDXL)

DGdev91 avatar Jun 06 '24 10:06 DGdev91

More steps helps with detail of finer things such as hands. I've found 80 to be the best balance of a really high number that actually gets hands right more often than not. At least on my 3060Ti running SD in Windows. The effects on details are pretty extreme as you increase steps. At low numbers of steps lots of details are left out and things like hands are messed up a lot. At medium you start getting more finer details and hands get better. Then as you get high it starts to fill in all sorts of little things -- especially on the high-res fix pass. My experience of messing around has been that 80 is about the best balance of getting large numbers without taking all day to run.

Which is why it's weird that it's set to 80, actually does 80 steps (yeah, I checked the console) and looks more like 20 or something. Details are lacking, things are aligned wrong, etc etc. It's hard to explain well, but it really is an order of magnitude difference in details.

Anyway, I'll look into that in greater detail later. My primary goal was just to get it working. It's still more effective to use the 3060Ti across the LAN since it's basically dedicated to running SD in Windows. High res is slow as heck because of the swapping, but I don't think it's actually slower than the same resolution on this card right now -- at least with the current settings like --medvram which slow it down some obviously. (But high resolutions like 1536x2048 still OOM without it. Or worse, break the whole system and I have to reboot...) This is moving beyond the scope of this issue so I'll leave that alone from here.

Anyway, best guess as to what fixed it was installing pytorch-rocm in the Miniconda before doing the venv, but I definitely recommend fixing the torch install command in the webui-user.sh before running it the first time to be sure.

Nazosan avatar Jun 06 '24 22:06 Nazosan

Hello! I tried to add set COMMANDLINE_ARGS=--use-directml --reinstall-torch and export HSA_OVERRIDE_GFX_VERSION=11.0.0 into my webui-user.sh but it didnt help My video card is 7900xtx and i am still getting this error RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check Ubuntu 24.04, ROCm v 6.7 What i can do to make it work?

c1tr00z avatar Jun 15 '24 09:06 c1tr00z

You would try what I just did above, but you have three oddities going there.

  1. You have it set to use directml which won't use HIP.
  2. AMD doesn't support Ubuntu 24.04 yet. I don't know what the heck is going on with that. You can't install the drivers on it except through regressing a number of things. Someone fell asleep at the wheel. You'll have the kernel driver for basic acceleration, but no extras like ROCm installed when you install Ubuntu 24.04.
  3. Current ROCm version for Linux is 6.1. There is no 6.7. Are you confusing 5.7 (current Windows version)?

Not sure what the ETA is on an updated package for Ubuntu. It has been out a long time now and is a LTS release, so that should have been highest priority, get that out on the same day kind of thing, yet here we are. (Someone needs some wakeup calls over there.)

Someone said they got the latest Jammy driver to install. I find this highly unlikely given the very different (incompatible) packages after some big changes like the Python mess (god I hate Python now) but you could give it a try: https://repo.radeon.com/amdgpu-install/latest/ubuntu/jammy/ You could take a risk and force it to install possibly, but that could break lots of things, so be careful. My suspicion is the only way to get a fully working configuration is to setup 22.04, fully install things, do very careful upgrades of specific things, then do a distro upgrade.

As a side note, I'm using Manjaro and packages for ROCm 6.1 Just Work(tm).

Nazosan avatar Jun 15 '24 20:06 Nazosan

Hello! I tried to add set COMMANDLINE_ARGS=--use-directml --reinstall-torch and export HSA_OVERRIDE_GFX_VERSION=11.0.0 into my webui-user.sh but it didnt help My video card is 7900xtx and i am still getting this error RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check Ubuntu 24.04, ROCm v 6.7 What i can do to make it work?

There's no rocm 6.7 yet, did you mean 5.7? Or maybe 6.1? Be sure you are using the last version available, wich right now should be 6.1.2

Don't use "--use-directml" on Linux. Also, you don't need to use "HSA_OVERRIDE_GFX_VERSION=11.0.0" for the 7900xtx. --reinstall-torch shouldn't be needed too.

Also add this on your webui-user.sh file:

TORCH_COMMAND="pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0"

  1. You have it set to use directml which won't use HIP.

No, that's wrong. He's using Linux and have ROCm installed, he * shouldn't * use directml, that's for Windows only and right now works way worse

Also... I'm not 100% sure on how the situation is right now on Ubuntu for rocm, but it should work just fine on 24.04.

AMD officially supports the 22.04 version only, but that doesn't mean it can't work elsewhere. Manjaro isn't supported too but it works fine there.

DGdev91 avatar Jun 16 '24 07:06 DGdev91

  1. You have it set to use directml which won't use HIP.

No, that's wrong. He's using Linux and have ROCm installed, he * shouldn't * use directml, that's for Windows only and right now works way worse

Ok, I'm just confused. How is it wrong to say that setting it to use directml will set it to not use HIP? Am I misunderstanding something?

Nazosan avatar Jun 16 '24 11:06 Nazosan

  1. You have it set to use directml which won't use HIP.

No, that's wrong. He's using Linux and have ROCm installed, he * shouldn't * use directml, that's for Windows only and right now works way worse

Ok, I'm just confused. How is it wrong to say that setting it to use directml will set it to not use HIP? Am I misunderstanding something?

That's not the point, DirectML is for Windows only. On Linux, he * should * use ROCm/HIP

DGdev91 avatar Jun 16 '24 11:06 DGdev91