[bug]: invokeai-rocm container doesn't support gpus
Is there an existing issue for this problem?
- [X] I have searched the existing issues
Operating system
Linux
GPU vendor
AMD (ROCm)
GPU model
RX 7900 XTX, RX 7700S
GPU VRAM
26GB, 8GB
Version number
invokeai-rocm
Browser
firefox
Python dependencies
No response
What happened
I am trying to use the container version like this:
--device /dev/kfd --device /dev/dri --volume ./:/invokeai -p 9090:9090 --name invokeai ghcr.io/invoke-ai/invokeai:main-rocm
However it doesn't seem to detect either of my amd GPUs and falls back to CPU. It also says bitsandbytes doesn't have GPU support.
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
[2024-10-02 05:54:19,891]::[InvokeAI]::INFO --> Patchmatch initialized
[2024-10-02 05:54:20,552]::[InvokeAI]::INFO --> Using torch device: CPU
ollama works fine with rocm I am not sure why this doesn't or how I can get it working?
What you expected to happen
I expect the container to start utilizing rocm and detecting the gpu
How to reproduce the problem
No response
Additional context
No response
Discord username
No response
Bare-metal affected too
Using Installer with rocm option - RX 6700 XT GPU
Works fine on < 5.0.2, but starting > 5.1 getting error
The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
Clearing pip cache isn't helping
Made systemd service
[Unit]
Description=InvokeAI
[Service]
ExecStart=/home/user/.local/invokeai/.venv/bin/invokeai-web
Environment="HSA_OVERRIDE_GFX_VERSION=10.3.0"
Environment="PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.95,max_split_size_mb:512"
Environment="INVOKEAI_ROOT=/home/user/.local/invokeai"
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=default.target
Same result on bare metal.
5.1.1 doesn't use/detect the ROCm device. An in-place install of 5.0.2 restores AMD support.
None of [version]-rocm containers work for me, even before version 5.0.2. Im using podman with the proper arguments (I know podman is not directly supported, but I have ollama running via this same configuration and I also run the bare metal installer via a rootless distrobox container and that has worked fine). Here are my arguments.
Image=ghcr.io/invoke-ai/invokeai:main-rocm
ContainerName=invokeai
AutoUpdate=registry
Environment=INVOKEAI_ROOT=/var/lib/invokeai
PublishPort=9091:9090
Volume=/var/home/user/.local/share/invokeai:/var/lib/invokeai
SecurityLabelDisable=true
AddDevice=/dev/dri
AddDevice=/dev/kfd
Using the 5.1.1 bare metal installer also defaults to using the CPU. But using the 5.0.2 bare metal installer (again under a rootless distrobox container) detects my AMD GPU and works as intended.
This is caused by an incorrect ROCm version, see #7146. I'm not familiar with docker but I assume changing the URL in line 41 of the dockerfile to "https://download.pytorch.org/whl/rocm6.1" should fix the issue.