stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

[Bug]: Ubuntu 22.04.1 | RX 6800XT | "Torch is not able to use GPU; add --skip-torch-cuda-test..."

Open bjlanger opened this issue 1 year ago • 11 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What happened?

  • Attempted to install the UI using both the automated install and the one located on the wiki for AMD GPUs.

  • Previously I had installed the drivers for ROCm 5.4.2 so I started fresh on a new Ubuntu install and had the same results.

  • Output is the same whether run through the automated script or follow the wiki, shown below. image

  • As an addendum I was running Arch before then swapped to Ubuntu and the automated script install went fine on there. At least it seemed to. Didn't generate any images with it but opened the Web UI. I can try it out later tomorrow on Arch again and see what happens.

Steps to reproduce the problem

  1. Be running on Ubuntu 22.04.1.
  2. Have a 6800XT
  3. Attempt to run automated install / install mentioned in Wiki for AMD GPUs
  4. Receive error.

What should have happened?

  • Automated install should work on Ubuntu 22.04

Commit where the problem happens

5c1cb9263f980641007088a37360fcab01761d37

What platforms do you use to access UI ?

Linux

What browsers do you use to access the UI ?

Mozilla Firefox

Command Line Arguments

No response

Additional information, context and logs

  • I did see that the issue seems to have been addressed in a manner in this pull request though I'm not sure if this would address my exact problem but it might. Might be specific to 22.04 if the rocm version being used is 5.1.1 as 22.04 isn't support according to here

bjlanger avatar Jan 24 '23 06:01 bjlanger

Did you try with python launch.py --skip-torch-cuda-test

sbersier avatar Jan 24 '23 10:01 sbersier

Yes, that'll let it proceed. However I did not need to provide that option to it on Arch Linux.

  • When attempting to run after this I get an error when generating an image. RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
  • Which as specified in the Wiki for newer AMD cards I should not need to provide the full precision options.
  • Also with torch not able to use the GPU wouldn't this then only run on my CPU? It should be using the Torch version for ROCm.

bjlanger avatar Jan 24 '23 14:01 bjlanger

try this for cpu first: python launch.py --skip-torch-cuda-test --use-cpu all --precision full --no-half

ClashSAN avatar Jan 24 '23 16:01 ClashSAN

Ref: (https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs) Since your RX6800XT GPU belongs to the RX6000 series, I would try python launch.py --skip-torch-cuda-test --precision full --no-half It is said: "As of 1/15/23 you can just run webui-user.sh and pytorch+rocm should be automatically installed for you." So, I guess it means that it will use the GPU with rocm (instead of CUDA)

sbersier avatar Jan 24 '23 17:01 sbersier

I have the exact same issue Os: linux mint Gpu: 6900xt

I have followed the wiki page. pip list inside venv shows that torch 1.13.1+rocm5.1.1 and torchvision 0.14.1+rocm5.1.1 are installed

borsoe avatar Jan 24 '23 17:01 borsoe

Yall might have more luck with newer ROCM packages than automatic is installing by default.

Arch linux, in particular, has some native rocm libs in the testing repository right now: https://archlinux.org/packages/?sort=&q=Rocm&maintainer=&flagged=

And you can install versions of pytorch for the new rocm builds from pytorch's nightly repo.

brucethemoose avatar Jan 24 '23 17:01 brucethemoose

Ref: (https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs) Since your RX6800XT GPU belongs to the RX6000 series, I would try python launch.py --skip-torch-cuda-test --precision full --no-half It is said: "As of 1/15/23 you can just run webui-user.sh and pytorch+rocm should be automatically installed for you." So, I guess it means that it will use the GPU with rocm (instead of CUDA)

webui-user.sh just has everything commented out. Even after uncommenting the sections, it doesn't do anything. I'm not sure what is going on with those directions, but they don't work.

VorpalQ avatar Jan 24 '23 18:01 VorpalQ

If you've installed pytorch+rocm correctly and activated the venv and cuda device is still not available you might have missed this: sudo usermod -aG render YOURLINUXUSERNAME sudo usermod -aG video YOURLINUXUSERNAME reboot afterwards!

you need to add your user to the render group for the permissions to schedule kernels to your gpu. btw save yourself some time and just run ./webui.sh --no-half-vae to start it up.

My 6900xt is working fine on all 3 available torch+rocm versions:

pip3 install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.1.1
pip3 install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.2
pip3 install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/rocm5.3

Jan-Huber avatar Jan 24 '23 20:01 Jan-Huber

If you've installed pytorch+rocm correctly and activated the venv and cuda device is still not available you might have missed this: sudo usermod -aG render YOURLINUXUSERNAME sudo usermod -aG video YOURLINUXUSERNAME reboot afterwards!

you need to add your user to the render group for the permissions to schedule kernels to your gpu. btw save yourself some time and just run ./webui.sh --no-half-vae to start it up.

My 6900xt is working fine on all 3 available torch+rocm versions:

pip3 install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.1.1
pip3 install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.2
pip3 install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/rocm5.3

That worked! Thanks! That info should probably get added to the Wiki entries as I didn't see that in there, at least in the guide concerning AMD cards.

bjlanger avatar Jan 24 '23 23:01 bjlanger

I'm using RX 5500 XT(8GB) and Ubuntu 22.04.1, but I can't generate any images with this error: MIOpen(HIP): Warning [SQLiteBase] Missing system database file: gfx1030_11.kdb Performance may degrade. Please follow instructions to install: https://github.com/ROCmSoftwarePlatform/MIOpen#installing-miopen-kernels-package And unfortunately, it hasn't yet been reported to this repo.

The following is what I did during the installation:

  • What @Jan-Huber said
    • sudo usermod -aG render (my username)
    • sudo usermod -aG video (my username)
  • https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs
    • git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
    • Placed stable diffusion checkpoint (model.ckpt) in the models/Stable-diffusion directory
    • Ran webui.sh (without adding COMMANDLINE_ARGS=--precision full --no-half in webui-user.sh)
  • To avoid the "middleware" error
    • source ./venv/bin/activate
    • pip install fastapi==0.90.1
    • deactivate

cyatarow avatar Feb 11 '23 15:02 cyatarow

I had a similar issue and, FWIW, I believe it is due to how the startup sequence works with Automatic. I modified it manually but currently it looks for an AMD device first and, if one is found, then just proceeds to try and use that even if there are CUDA devices present.

My issue was that I have an AMD CPU but running Nvidia GPUs and it was refusing to use them despite CUDA etc. all being correctly configured.

This is the section of the code which causes the issue.

https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/0cc0ee1bcb4c24a8c9715f66cede06601bfc00c8/webui.sh#L106-L122

# Check prerequisites
gpu_info=$(lspci 2>/dev/null | grep VGA)
case "$gpu_info" in
    *"Navi 1"*|*"Navi 2"*) export HSA_OVERRIDE_GFX_VERSION=10.3.0
    ;;
    *"Renoir"*) export HSA_OVERRIDE_GFX_VERSION=9.0.0
        printf "\n%s\n" "${delimiter}"
        printf "Experimental support for Renoir: make sure to have at least 4GB of VRAM and 10GB of RAM or enable cpu mode: --use-cpu all --no-half"
        printf "\n%s\n" "${delimiter}"
    ;;
    *) 
    ;;
esac
if echo "$gpu_info" | grep -q "AMD" && [[ -z "${TORCH_COMMAND}" ]]
then
    export TORCH_COMMAND="pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/rocm5.2"
fi  

My suggestion would be to fix this such that a user is able to specify which devices to use in order of preference and allow the script to fully detect the environment before proceeding.

darth-veitcher avatar Mar 03 '23 10:03 darth-veitcher

try this for cpu first: python launch.py --skip-torch-cuda-test --use-cpu all --precision full --no-half

This is OK. I have successfully run it, but it is running on the CPU. I will continue to find ways to use the GPU.

mp075496706 avatar Mar 30 '23 07:03 mp075496706