InvokeAI [bug]: invokeai-rocm container doesn't support gpus

Is there an existing issue for this problem?

[X] I have searched the existing issues

Operating system

Linux

GPU vendor

AMD (ROCm)

GPU model

RX 7900 XTX, RX 7700S

GPU VRAM

26GB, 8GB

Version number

invokeai-rocm

Browser

firefox

Python dependencies

No response

What happened

I am trying to use the container version like this:

--device /dev/kfd --device /dev/dri --volume ./:/invokeai -p 9090:9090 --name invokeai ghcr.io/invoke-ai/invokeai:main-rocm

However it doesn't seem to detect either of my amd GPUs and falls back to CPU. It also says bitsandbytes doesn't have GPU support.

The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
[2024-10-02 05:54:19,891]::[InvokeAI]::INFO --> Patchmatch initialized
[2024-10-02 05:54:20,552]::[InvokeAI]::INFO --> Using torch device: CPU

ollama works fine with rocm I am not sure why this doesn't or how I can get it working?

What you expected to happen

I expect the container to start utilizing rocm and detecting the gpu

How to reproduce the problem

No response

Additional context

No response

Discord username

No response

Oct 02 '24 06:10 nmcbride

Bare-metal affected too Using Installer with rocm option - RX 6700 XT GPU Works fine on < 5.0.2, but starting > 5.1 getting error The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. Clearing pip cache isn't helping

Made systemd service

[Unit]
Description=InvokeAI

[Service]
ExecStart=/home/user/.local/invokeai/.venv/bin/invokeai-web
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
Environment="PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.95,max_split_size_mb:512"
Environment="INVOKEAI_ROOT=/home/user/.local/invokeai"
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=default.target

Oct 09 '24 15:10 SadmL

Same result on bare metal.

5.1.1 doesn't use/detect the ROCm device. An in-place install of 5.0.2 restores AMD support.

Oct 16 '24 00:10 slartibartfast11

None of [version]-rocm containers work for me, even before version 5.0.2. Im using podman with the proper arguments (I know podman is not directly supported, but I have ollama running via this same configuration and I also run the bare metal installer via a rootless distrobox container and that has worked fine). Here are my arguments.

Image=ghcr.io/invoke-ai/invokeai:main-rocm
ContainerName=invokeai
AutoUpdate=registry
Environment=INVOKEAI_ROOT=/var/lib/invokeai
PublishPort=9091:9090
Volume=/var/home/user/.local/share/invokeai:/var/lib/invokeai
SecurityLabelDisable=true
AddDevice=/dev/dri
AddDevice=/dev/kfd

Using the 5.1.1 bare metal installer also defaults to using the CPU. But using the 5.0.2 bare metal installer (again under a rootless distrobox container) detects my AMD GPU and works as intended.

Oct 16 '24 16:10 apoordev

This is caused by an incorrect ROCm version, see #7146. I'm not familiar with docker but I assume changing the URL in line 41 of the dockerfile to "https://download.pytorch.org/whl/rocm6.1" should fix the issue.

Oct 18 '24 13:10 max-maag