ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

"RuntimeError: No HIP GPUs are available" on AMD 6700XT (Ubuntu 22.04.2)

Open snaeker58 opened this issue 2 years ago • 18 comments

Specifications:

OS: Ubuntu 22.04.2 LTS

Ry-zen 7 3700X: *-cpu description: CPU product: AMD Ryzen 7 3700X 8-Core Processor vendor: Advanced Micro Devices [AMD] physical id: 11 bus info: cpu@0 version: 23.113.0 serial: Unknown slot: AM4 size: 2794MHz capacity: 4426MHz width: 64 bits clock: 100MHz

AMD Rad-eon 6700XT: *-display description: VGA compatible controller product: Navi 22 [Radeon RX 6700/6700 XT / 6800M] vendor: Advanced Micro Devices, Inc. [AMD/ATI] physical id: 0 bus info: pci@0000:08:00.0 logical name: /dev/fb0 version: c1 width: 64 bits clock: 33MHz capabilities: pm pciexpress msi vga_controller bus_master cap_list rom fb configuration: depth=32 driver=amdgpu latency=0 mode=1920x1080 resolution=2560,1080 visual=truecolor xres=1920 yres=1080

I have ROCm installed with: sudo amdgpu-install --usecase = hiplibsdk, rocm following AMDs instructions for Ubuntu 22.04

ROCm System Management Interface Concise Info GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0 48.0c 8.0W 500Mhz 96Mhz 0% auto 203.0W 6% 0% End of ROCm SMI Log

I installed Comfy UI following the Installation Guide for Linux.

Everything works fine until I prompt something. On prompt I get following error: got prompt Global Step: 470000 making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 4, 32, 32) = 4096 dimensions. making attention of type 'vanilla' with 512 in_channels ERROR STARTS HERE Traceback (most recent call last): File "/home/karl/ComfyUI/execution.py", line 185, in execute recursive_execute(self.server, prompt, self.outputs, x, extra_data, executed) File "/home/karl/ComfyUI/execution.py", line 58, in recursive_execute recursive_execute(server, prompt, outputs, input_unique_id, extra_data, executed) File "/home/karl/ComfyUI/execution.py", line 58, in recursive_execute recursive_execute(server, prompt, outputs, input_unique_id, extra_data, executed) File "/home/karl/ComfyUI/execution.py", line 58, in recursive_execute recursive_execute(server, prompt, outputs, input_unique_id, extra_data, executed) File "/home/karl/ComfyUI/execution.py", line 67, in recursive_execute outputs[unique_id] = getattr(obj, obj.FUNCTION)(**input_data_all) File "/home/karl/ComfyUI/nodes.py", line 290, in load_checkpoint out = comfy.sd.load_checkpoint_guess_config(ckpt_path, output_vae=True, output_clip=True, embedding_directory=folder_paths.get_folder_paths("embeddings")) File "/home/karl/ComfyUI/comfy/sd.py", line 970, in load_checkpoint_guess_config vae = VAE() File "/home/karl/ComfyUI/comfy/sd.py", line 513, in __init__ device = model_management.get_torch_device() File "/home/karl/ComfyUI/comfy/model_management.py", line 250, in get_torch_device return torch.cuda.current_device() File "/home/karl/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 674, in current_device _lazy_init() File "/home/karl/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init torch._C._cuda_init() RuntimeError: No HIP GPUs are available

I'm guessing this is a issue between Torch and ROCm, any things I could try and or solutions?

snaeker58 avatar May 12 '23 13:05 snaeker58

Update: I think the problem likely stems from PyTorch not yet supporting ROCm 5. Even if that is not the case it still should cause other errors.

snaeker58 avatar May 12 '23 13:05 snaeker58

Downgrading ROCm did not solve anything. I found the same issue on Automatic1111, https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/8828 Currently installing an older PyTorch version. Maybe that will solve the issue. I got Auto1111 running once, but I forgot the mystery fix that made it work...

snaeker58 avatar May 12 '23 19:05 snaeker58

Hey older PyTorch version, new error. So this is definitely a PyTorch x ROCm issue. What a suprise...

/usr/lib/python3/dist-packages/pkg_resources/__init__.py:116: PkgResourcesDeprecationWarning: 0.1.43ubuntu1 is an invalid version and will not be supported in a future release
  warnings.warn(
/usr/lib/python3/dist-packages/pkg_resources/__init__.py:116: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
  warnings.warn(
/usr/lib/python3/dist-packages/pkg_resources/__init__.py:116: PkgResourcesDeprecationWarning: 0.1.43ubuntu1 is an invalid version and will not be supported in a future release
  warnings.warn(
/usr/lib/python3/dist-packages/pkg_resources/__init__.py:116: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
  warnings.warn(
Global Step: 840000
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Traceback (most recent call last):
  File "/home/karl/ComfyUI/execution.py", line 185, in execute
    recursive_execute(self.server, prompt, self.outputs, x, extra_data, executed)
  File "/home/karl/ComfyUI/execution.py", line 58, in recursive_execute
    recursive_execute(server, prompt, outputs, input_unique_id, extra_data, executed)
  File "/home/karl/ComfyUI/execution.py", line 58, in recursive_execute
    recursive_execute(server, prompt, outputs, input_unique_id, extra_data, executed)
  File "/home/karl/ComfyUI/execution.py", line 58, in recursive_execute
    recursive_execute(server, prompt, outputs, input_unique_id, extra_data, executed)
  File "/home/karl/ComfyUI/execution.py", line 67, in recursive_execute
    outputs[unique_id] = getattr(obj, obj.FUNCTION)(**input_data_all)
  File "/home/karl/ComfyUI/nodes.py", line 290, in load_checkpoint
    out = comfy.sd.load_checkpoint_guess_config(ckpt_path, output_vae=True, output_clip=True, embedding_directory=folder_paths.get_folder_paths("embeddings"))
  File "/home/karl/ComfyUI/comfy/sd.py", line 970, in load_checkpoint_guess_config
    vae = VAE()
  File "/home/karl/ComfyUI/comfy/sd.py", line 513, in __init__
    device = model_management.get_torch_device()
  File "/home/karl/ComfyUI/comfy/model_management.py", line 250, in get_torch_device
    return torch.cuda.current_device()
  File "/home/karl/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 552, in current_device
    _lazy_init()
  File "/home/karl/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 229, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice

/home/karl/.local/lib/python3.10/site-packages/torch/cuda/__init__.py:88: UserWarning: HIP initialization: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice (Triggered internally at ../c10/hip/HIPFunctions.cpp:110.)
  return torch._C._cuda_getDeviceCount() > 0

snaeker58 avatar May 12 '23 19:05 snaeker58

The pytorch ROCm builds are standalone, they don't require you to have ROCm actually installed. They only require you to have a compatible kernel.

Try launching comfyui with: HSA_OVERRIDE_GFX_VERSION=10.3.0 python main.py

comfyanonymous avatar May 12 '23 19:05 comfyanonymous

The pytorch ROCm builds are standalone, they don't require you to have ROCm actually installed. They only require you to have a compatible kernel.

Try launching comfyui with: HSA_OVERRIDE_GFX_VERSION=10.3.0 python main.py

I've been doing that the entire time. Unless i've been doing it wrong by typing HSA_OVERRIDE_GFX_VERSION=10.3.0 python main.py into terminal in the ComfyUI directory?

snaeker58 avatar May 12 '23 19:05 snaeker58

If you have ROCm installed uninstall it completely, it might be conflicting with the ROCm that comes bundled with the pytorch package.

comfyanonymous avatar May 12 '23 19:05 comfyanonymous

If you have ROCm installed uninstall it completely, it might be conflicting with the ROCm that comes bundled with the pytorch package.

Yep, I just reinstalled the entire OS, get everything completely clean to maximize my chances. Big moment in a few minutes... Will it work?

snaeker58 avatar May 12 '23 20:05 snaeker58

Welp, one step forward, one step back...

New error!

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

I'm a bit lost now, this is an AMD system with a 6700XT? Do I still need an NVIDIA driver? Edit: Just takes me to the NVIDIA drivers page, sadly no Radeon 6700XT Drivers... xD

snaeker58 avatar May 12 '23 20:05 snaeker58

That error means you installed the wrong pytorch, what's your python version?

comfyanonymous avatar May 12 '23 20:05 comfyanonymous

That error means you installed the wrong pytorch, what's your python version?

2.0.1+cu117 Guessing cu stands for Cuda, not ROCm? Did I install the NVIDIA Version??

snaeker58 avatar May 12 '23 20:05 snaeker58

I opened a new Issue for this different error. Makes it easier for people with the same issue later. #653

snaeker58 avatar May 12 '23 20:05 snaeker58

So must have been wrong PyTorch, now I'm getting the original "RuntimeError: No HIP GPUs are available " error. Clean install, followed instructions.

snaeker58 avatar May 12 '23 21:05 snaeker58

So I previously thought this issue might be to blame on ROCm (witch seems to be pretending to be CUDA from my very limited understanding) not working with PyTorch properly, but I did some playing around with PyTorch in Python and that seemed to work, so now I'm just confused. I have no prior knowledge in anything, but especially in PyTorch and anything ROCm, so maybe I am misunderstanding this.

This is what I did/what worked: Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

>>> import torch
>>> import math
>>> x = torch.empty(3,4)
>>> print(x)
tensor([[ 6.7262e-44,  0.0000e+00,  6.7262e-44,  0.0000e+00],
        [ 2.7758e+14,  7.0065e-45,  0.0000e+00,  0.0000e+00],
        [-1.5173e+19,  4.5652e-41,  7.4849e+31,  4.5653e-41]])
>>> print(type(x))
<class 'torch.Tensor'>
>>> ones = torch.zeros(2, 2) + 1
>>> twos = torch.ones(2, 2) * 2
>>> threes = (torch.ones(2, 2) * 7 - 1) / 2
>>> fours = twos ** 2
>>> sqrt2s = twos ** 0.5
>>> print(ones)
tensor([[1., 1.],
        [1., 1.]])
>>> print(twos)
tensor([[2., 2.],
        [2., 2.]])
>>> print(threes)
tensor([[3., 3.],
        [3., 3.]])
>>> print(fours)
tensor([[4., 4.],
        [4., 4.]])
>>> print(sqrt2s)
tensor([[1.4142, 1.4142],
        [1.4142, 1.4142]])

What I also realized: The instructions lists HSA_OVERRIDE_GFX_VERSION=10.3.0 python main.py, but I've been running HSA_OVERRIDE_GFX_VERSION=10.3.0 python3 main.py, because otherwise well python unhappy... In my mind this shouldn't make a difference, do I need python-is-python3 ??? But that wouldn't change anything as far as I know, right?

snaeker58 avatar May 12 '23 21:05 snaeker58

rocm should add support for gfx1031, until then rocm and pytorch should be compiled manually with -DCMAKE_HIP_ARCHITECTURES="gfx1031" or -DAMDGPU_TARGETS="gfx1031" variables... well, it's just rocm.

comparing precompiled pytorch with HSA_OVERRIDE_GFX_VERSION=10.3.0 and compiled with gfx1031 support (sic!) there is difference

p.s. http://reddit.com/r/AMD_Stock/comments/136duk0/upcoming_rocm_linux_gpu_os_support/

fractal-fumbler avatar May 13 '23 09:05 fractal-fumbler

I've successfully used ComfyUI with RX 6700 on Ubuntu 22.10 (shouldn't differ too much from 22.04). I did install ROCm 5.4.3, but gave up on compiling PyTorch (it's a world of pain and you actually do need ROCm installed to compile it anyway).

PyTroch precompiled against ROCm 5.5 is not yet available and the best you can get right now is 5.4.2 (it will work with ROCm 5.4.3 but not 5.5).

If you have different version of ROCm installed already you might want to uninstall it using: sudo amdgpu-uninstall --rocmrelease=all

You should be able to install ROCm using those commands:

sudo apt-get update
wget https://repo.radeon.com/amdgpu-install/5.4.3/ubuntu/jammy/amdgpu-install_5.4.50403-1_all.deb 
sudo apt-get install ./amdgpu-install_5.4.50403-1_all.deb
sudo amdgpu-install --usecase=rocm,hip,mllib --no-dkms
sudo usermod -a -G video,render $LOGNAME

The last line gives your user access to the GPU which is supposedly needed by ROCm version of PyTorch. You need to restart Ubuntu after this. You can replace hip with hiplibsdk if you want. You'll need hiplibsdk to compile stuff with ROCm (like PyTorch or Ooba Booga LLaMA plugin), but it shouldn't be needed just for running ComfyUI.

In your ComfyUI folder start venv and uninstall existing PyTorch and install PyTorch with ROCm enabled.

source venv/bin/activate
pip3 uninstall torch torchvision torchaudio
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2

I'm only 90% sure about the command for activating venv ;) haven't done it manually in like 2 weeks. You might want to use pip instead of pip3 if you want (on my system there was no difference).

For running ComfyUI:

source venv/bin/activate
export HSA_OVERRIDE_GFX_VERSION=10.3.0
python main.py

You can save that to an sh script and run that instead. I did something like that, but I'm writing this on Windows and I don't have access to my Ubuntu at the moment to check.

P.S. I've wasted a lot of time trying to get SD to work on RX6700 on Ubuntu and then wasted even more time to get it to work on 7900 XTX.

jahu00 avatar May 20 '23 21:05 jahu00

If you have ROCm installed uninstall it completely, it might be conflicting with the ROCm that comes bundled with the pytorch package.

emm,Do I need to uninstall amd gpu driver?Does amd gpu driver contain rocm?I am new to ubuntu

plane714 avatar Oct 26 '23 07:10 plane714

I've successfully used ComfyUI with RX 6700 on Ubuntu 22.10 (shouldn't differ too much from 22.04). I did install ROCm 5.4.3, but gave up on compiling PyTorch (it's a world of pain and you actually do need ROCm installed to compile it anyway).

PyTroch precompiled against ROCm 5.5 is not yet available and the best you can get right now is 5.4.2 (it will work with ROCm 5.4.3 but not 5.5).

If you have different version of ROCm installed already you might want to uninstall it using: sudo amdgpu-uninstall --rocmrelease=all

You should be able to install ROCm using those commands:

sudo apt-get update
wget https://repo.radeon.com/amdgpu-install/5.4.3/ubuntu/jammy/amdgpu-install_5.4.50403-1_all.deb 
sudo apt-get install ./amdgpu-install_5.4.50403-1_all.deb
sudo amdgpu-install --usecase=rocm,hip,mllib --no-dkms
sudo usermod -a -G video,render $LOGNAME

The last line gives your user access to the GPU which is supposedly needed by ROCm version of PyTorch. You need to restart Ubuntu after this. You can replace hip with hiplibsdk if you want. You'll need hiplibsdk to compile stuff with ROCm (like PyTorch or Ooba Booga LLaMA plugin), but it shouldn't be needed just for running ComfyUI.

In your ComfyUI folder start venv and uninstall existing PyTorch and install PyTorch with ROCm enabled.

source venv/bin/activate
pip3 uninstall torch torchvision torchaudio
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2

I'm only 90% sure about the command for activating venv ;) haven't done it manually in like 2 weeks. You might want to use pip instead of pip3 if you want (on my system there was no difference).

For running ComfyUI:

source venv/bin/activate
export HSA_OVERRIDE_GFX_VERSION=10.3.0
python main.py

You can save that to an sh script and run that instead. I did something like that, but I'm writing this on Windows and I don't have access to my Ubuntu at the moment to check.

P.S. I've wasted a lot of time trying to get SD to work on RX6700 on Ubuntu and then wasted even more time to get it to work on 7900 XTX.

If anyone comes here do this with just changing the version

Canahmetozguven avatar Dec 03 '23 22:12 Canahmetozguven

I've successfully used ComfyUI with RX 6700 on Ubuntu 22.10 (shouldn't differ too much from 22.04). I did install ROCm 5.4.3, but gave up on compiling PyTorch (it's a world of pain and you actually do need ROCm installed to compile it anyway). PyTroch precompiled against ROCm 5.5 is not yet available and the best you can get right now is 5.4.2 (it will work with ROCm 5.4.3 but not 5.5). If you have different version of ROCm installed already you might want to uninstall it using: sudo amdgpu-uninstall --rocmrelease=all You should be able to install ROCm using those commands:

sudo apt-get update
wget https://repo.radeon.com/amdgpu-install/5.4.3/ubuntu/jammy/amdgpu-install_5.4.50403-1_all.deb 
sudo apt-get install ./amdgpu-install_5.4.50403-1_all.deb
sudo amdgpu-install --usecase=rocm,hip,mllib --no-dkms
sudo usermod -a -G video,render $LOGNAME

The last line gives your user access to the GPU which is supposedly needed by ROCm version of PyTorch. You need to restart Ubuntu after this. You can replace hip with hiplibsdk if you want. You'll need hiplibsdk to compile stuff with ROCm (like PyTorch or Ooba Booga LLaMA plugin), but it shouldn't be needed just for running ComfyUI. In your ComfyUI folder start venv and uninstall existing PyTorch and install PyTorch with ROCm enabled.

source venv/bin/activate
pip3 uninstall torch torchvision torchaudio
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2

I'm only 90% sure about the command for activating venv ;) haven't done it manually in like 2 weeks. You might want to use pip instead of pip3 if you want (on my system there was no difference). For running ComfyUI:

source venv/bin/activate
export HSA_OVERRIDE_GFX_VERSION=10.3.0
python main.py

You can save that to an sh script and run that instead. I did something like that, but I'm writing this on Windows and I don't have access to my Ubuntu at the moment to check. P.S. I've wasted a lot of time trying to get SD to work on RX6700 on Ubuntu and then wasted even more time to get it to work on 7900 XTX.

If anyone comes here do this with just changing the version

I must be a total retard for not fully understanding what you mean by If anyone comes here do this with just changing the version

Very very tired of trying to get everything working properly, sorry for asking what you meant with that but i'd rather ask then have to start fresh again for the 100th time..

Thanks in advance!

NuffsaidNL avatar Aug 29 '24 13:08 NuffsaidNL

After following instructions on the README, I encountered the same "No HIP GPUs are available". Doing sudo usermod -a -G video,render $LOGNAME was enough to fix it for me.

elliotpotts avatar Jan 22 '25 05:01 elliotpotts

I tried so many things, but in the end all I had to do was to run it with sudo -_-

sudo $(which python) main.py

OjusWiZard avatar Feb 07 '25 22:02 OjusWiZard