stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

[Bug]: RuntimeError: Torch is not able to use GPU

Open asinwang opened this issue 1 year ago • 5 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What happened?

$ ./webui.sh

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on debian user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
Using TCMalloc: libtcmalloc.so.4
Python 3.10.9 (main, Mar  1 2023, 18:23:06) [GCC 11.2.0]
Version: v1.3.2
Commit hash: baf6946e06249c5af9851c60171692c44ef633e0
Traceback (most recent call last):
  File "/home/debian/project/stable-diffusion-webui/launch.py", line 38, in <module>
    main()
  File "/home/debian/project/stable-diffusion-webui/launch.py", line 29, in main
    prepare_environment()
  File "/home/debian/project/stable-diffusion-webui/modules/launch_utils.py", line 257, in prepare_environment
    raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check



$ nvidia-smi
Fri Jun 23 22:51:58 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090         On | 00000000:01:00.0 Off |                  Off |
|  0%   31C    P8               28W / 450W|     21MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2796      G   /usr/lib/xorg/Xorg                            8MiB |
|    0   N/A  N/A      2825      G   /usr/bin/gnome-shell                         10MiB |
+---------------------------------------------------------------------------------------+

Steps to reproduce the problem

none

What should have happened?

none

Commit where the problem happens

none

What Python version are you running on ?

Python 3.10.x

What platforms do you use to access the UI ?

Linux

What device are you running WebUI on?

Nvidia GPUs (RTX 20 above)

What browsers do you use to access the UI ?

No response

Command Line Arguments

none

List of extensions

none

Console logs

none

Additional information

No response

asinwang avatar Jun 23 '23 14:06 asinwang

Have run into the same issue, this is from my post in discussion section (not sure why it goes to there): I tried to run webui.sh on an Ubuntu 22.04 server, and get: Traceback (most recent call last): File "/home/user/stable-diffusion-webui/launch.py", line 38, in main() File "/home/user/stable-diffusion-webui/launch.py", line 29, in main prepare_environment() File "/home/user/stable-diffusion-webui/modules/launch_utils.py", line 257, in prepare_environment raise RuntimeError( RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check

But I have a GPU (RTX 3060) and think I have installed cuda correctly (have done the same in WSL enviroment of the same PC and get webui working), and oobabooga run correctly on GPU. I kind of suspect that it is because the PC have two GPU, one iGPU (togather with an AMD CPU) and one RTX 3060.

When I run sudo lshw -C display I get: *-display description: VGA compatible controller product: GA106 [GeForce RTX 3060] vendor: NVIDIA Corporation physical id: 0 bus info: pci@0000:01:00.0 logical name: /dev/fb0 version: a1 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress vga_controller bus_master cap_list rom fb configuration: depth=32 driver=nvidia latency=0 mode=3840x2160 visual=truecolor xres=3840 yres=2160 resources: iomemory:780-77f iomemory:7c0-7bf irq:86 memory:fb000000-fbffffff memory:7800000000-7bffffffff memory:7c00000000-7c01ffffff ioport:f000(size=128) memory:fc000000-fc07ffff *-display description: VGA compatible controller product: Cezanne vendor: Advanced Micro Devices, Inc. [AMD/ATI] physical id: 0 bus info: pci@0000:0d:00.0 logical name: /dev/fb0 version: c9 width: 64 bits clock: 33MHz capabilities: pm pciexpress msi msix vga_controller bus_master cap_list fb configuration: depth=32 driver=amdgpu latency=0 resolution=3840,2160 resources: iomemory:7c0-7bf iomemory:7c0-7bf irq:31 memory:7c10000000-7c1fffffff memory:7c20000000-7c201fffff ioport:e000(size=256) memory:fc500000-fc57ffff

When I am trying to run: import torch import sys print('__Python VERSION:', sys.version) print('__pyTorch VERSION:', torch.version) print('__CUDA VERSION') from subprocess import call print('__CUDNN VERSION:', torch.backends.cudnn.version()) print('__Number CUDA Devices:', torch.cuda.device_count()) print('__Devices') call(["nvidia-smi", "--format=csv", "--query-gpu=index,name,driver_version,memory.total,memory.used,memory.free"]) print('Active CUDA Device: GPU', torch.cuda.current_device()) print ('Available devices ', torch.cuda.device_count()) print ('Current cuda device ', torch.cuda.current_device())

I get an error: __Python VERSION: 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0] __pyTorch VERSION: 2.0.1+rocm5.4.2 __CUDA VERSION __CUDNN VERSION: 2019000 __Number CUDA Devices: 1 __Devices index, name, driver_version, memory.total [MiB], memory.used [MiB], memory.free [MiB] 0, NVIDIA GeForce RTX 3060, 530.41.03, 12288 MiB, 1 MiB, 12043 MiB Traceback (most recent call last): File "/home/user/data/test.py", line 12, in print('Active CUDA Device: GPU', torch.cuda.current_device()) File "/home/user/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py", line 674, in current_device _lazy_init() File "/home/user/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/cuda/init.py", line 247, in _lazy_init torch._C._cuda_init() RuntimeError: No HIP GPUs are available

I think I have tried all I could found, but the error persist. I have get webui.sh run in WSL enviroment of the same PC, so it shouldn't be hardware issue.

greagain avatar Jun 23 '23 23:06 greagain

I 'solve' the problem by copying my stabel diffusion folder from WSL2 to the dual boot Ubuntu system, not sure why, but it work.

greagain avatar Jun 24 '23 14:06 greagain

I think I found the solution to this issue. Mount necessary GPU-related files: Make sure to mount the appropriate NVIDIA driver files and libraries inside the container. my webui is setup in a docker container so I Added the following volume mounts to your Docker Compose file:

yaml Copy code services: stable-diffusion: ... volumes: ...

  • /usr/local/nvidia/lib64:/usr/local/nvidia/lib64
  • /usr/local/nvidia/bin:/usr/local/nvidia/bin ... Adjust the source paths /usr/local/nvidia/lib64 and /usr/local/nvidia/bin based on the actual locations of your NVIDIA driver files on the host system.

If you don't know where they are use this to find them: find / -name "libnvidia-*.so" 2>/dev/null find / -name "nvidia-smi" 2>/dev/null

I imagine this same solution can work for regular installs as well. Seems like pytorch doesn't know the locations of some nvidia files required to use the GPU.

Yeghro avatar Jun 25 '23 02:06 Yeghro

I use Ubuntu 22.04, NVIDA GPU and have same issue, in my case, I install pytorch 2.0.1 without rocm5.4.2 and it works

You can change this code in webui.sh file: export TORCH_COMMAND="pip install torch==2.0.1+rocm5.4.2 torchvision==0.15.2+rocm5.4.2 --index-url https://download.pytorch.org/whl/rocm5.4.2" to export TORCH_COMMAND="pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118"

cuongng-99 avatar Jul 05 '23 04:07 cuongng-99

I use Ubuntu 22.04, NVIDA GPU and have same issue, in my case, I install pytorch 2.0.1 without rocm5.4.2 and it works

You can change this code in webui.sh file: export TORCH_COMMAND="pip install torch==2.0.1+rocm5.4.2 torchvision==0.15.2+rocm5.4.2 --index-url https://download.pytorch.org/whl/rocm5.4.2" to export TORCH_COMMAND="pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118"

how to edit this code from the terminal?

sitzbrau avatar Mar 27 '24 18:03 sitzbrau