[bug]: Invoke refuses to use my RX 7600 XT GPU
Is there an existing issue for this problem?
- [x] I have searched the existing issues
Operating system
Linux
GPU vendor
AMD (ROCm)
GPU model
RX 7600 XT
GPU VRAM
16GB
Version number
5.5.0
Browser
Firefox 134.0
Python dependencies
{
"accelerate": "1.0.1",
"compel": "2.0.2",
"cuda": null,
"diffusers": "0.31.0",
"numpy": "1.26.3",
"opencv": "4.9.0.80",
"onnx": "1.16.1",
"pillow": "10.2.0",
"python": "3.11.11",
"torch": "2.4.1+rocm6.1",
"torchvision": "0.19.1+rocm6.1",
"transformers": "4.46.3",
"xformers": null
}
What happened
Every time I try to generate an image I get error:
Server Error
RuntimeError: HIP error: invalid device function HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3 Compile with `TORCH_USE_HIP_DSA` to...
What you expected to happen
I expected image generation to start.
How to reproduce the problem
In my setup all image generation attempts produce this error. Using a CPU-only, no-GPU configuration works as expected... and, as expected, is very slow.
Additional context
I have seen several bug reports mentioning ROCm, but I didn't find anything really comparable. Notice I'm a completely newbie at AI hosting so I might be missing something pretty basic.
Full specs of my server are:
root@ikea:~# lshw -short
H/W path Device Class Description
================================================================
system MS-7C91 (To be filled by O.E.M.)
/0 bus MPG B550 GAMING EDGE WIFI (MS-7C91)
/0/0 memory 64KiB BIOS
/0/10 memory 32GiB System Memory
/0/10/0 memory 2667 MHz (0.4 ns) [empty]
/0/10/1 memory 16GiB DIMM DDR4 Synchronous Unbuffered (Unregistered) 2667 MHz (0.4 ns)
/0/10/2 memory 2667 MHz (0.4 ns) [empty]
/0/10/3 memory 16GiB DIMM DDR4 Synchronous Unbuffered (Unregistered) 2667 MHz (0.4 ns)
/0/13 memory 1MiB L1 cache
/0/14 memory 8MiB L2 cache
/0/15 memory 64MiB L3 cache
/0/16 processor AMD Ryzen 9 5950X 16-Core Processor
/0/100 bridge Starship/Matisse Root Complex
/0/100/0.2 generic Starship/Matisse IOMMU
/0/100/1.1 bridge Starship/Matisse GPP Bridge
/0/100/1.1/0 /dev/nvme0 storage CT2000P2SSD8
/0/100/1.1/0/0 hwmon0 disk NVMe disk
/0/100/1.1/0/2 /dev/ng0n1 disk NVMe disk
/0/100/1.1/0/1 /dev/nvme0n1 disk 2TB NVMe disk
/0/100/1.1/0/1/1 /dev/nvme0n1p1 volume 511MiB Windows FAT volume
/0/100/1.1/0/1/2 /dev/nvme0n1p2 volume 201GiB EXT4 volume
/0/100/1.1/0/1/3 /dev/nvme0n1p3 volume 1023MiB Linux swap volume
/0/100/1.1/0/1/4 /dev/nvme0n1p4 volume 1660GiB EXT4 volume
/0/100/1.2 bridge Starship/Matisse GPP Bridge
/0/100/1.2/0 bus 500 Series Chipset USB 3.1 XHCI Controller
/0/100/1.2/0/0 usb1 bus xHCI Host Controller
/0/100/1.2/0/0/2 bus USB2.0 Hub
/0/100/1.2/0/0/8 input6 input MSI MYSTIC LIGHT
/0/100/1.2/0/0/9 communication AX200 Bluetooth
/0/100/1.2/0/1 usb2 bus xHCI Host Controller
/0/100/1.2/0.1 storage 500 Series Chipset SATA Controller
/0/100/1.2/0.2 bridge 500 Series Chipset Switch Upstream Port
/0/100/1.2/0.2/8 bridge Advanced Micro Devices, Inc. [AMD]
/0/100/1.2/0.2/8/0 wlo1 network Wi-Fi 6 AX200
/0/100/1.2/0.2/9 bridge Advanced Micro Devices, Inc. [AMD]
/0/100/1.2/0.2/9/0 enp42s0 network RTL8125 2.5GbE Controller
/0/100/3.1 bridge Starship/Matisse GPP Bridge
/0/100/3.1/0 bridge Navi 10 XL Upstream Port of PCI Express Switch
/0/100/3.1/0/0 /dev/fb0 bridge Navi 10 XL Downstream Port of PCI Express Switch
/0/100/3.1/0/0/0 /dev/fb0 display Navi 33 [Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600]
/0/100/3.1/0/0/0.1 card0 multimedia Navi 31 HDMI/DP Audio
/0/100/3.1/0/0/0.1/0 input10 input HDA ATI HDMI HDMI/DP,pcm=3
/0/100/3.1/0/0/0.1/1 input11 input HDA ATI HDMI HDMI/DP,pcm=7
/0/100/3.1/0/0/0.1/2 input12 input HDA ATI HDMI HDMI/DP,pcm=8
/0/100/3.1/0/0/0.1/3 input13 input HDA ATI HDMI HDMI/DP,pcm=9
/0/100/7.1 bridge Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
/0/100/7.1/0 generic Starship/Matisse PCIe Dummy Function
/0/100/8.1 bridge Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
/0/100/8.1/0 generic Starship/Matisse Reserved SPP
/0/100/8.1/0.1 generic Starship/Matisse Cryptographic Coprocessor PSPCPP
/0/100/8.1/0.3 bus Matisse USB 3.0 Host Controller
/0/100/8.1/0.3/0 usb3 bus xHCI Host Controller
/0/100/8.1/0.3/0/1 input0 input CX 2.4G Receiver System Control
/0/100/8.1/0.3/1 usb4 bus xHCI Host Controller
/0/100/8.1/0.4 card1 multimedia Starship/Matisse HD Audio Controller
/0/100/8.1/0.4/0 input14 input HDA Digital PCBeep
/0/100/8.1/0.4/1 input15 input HD-Audio Generic Rear Mic
/0/100/8.1/0.4/2 input16 input HD-Audio Generic Front Mic
/0/100/8.1/0.4/3 input17 input HD-Audio Generic Line
/0/100/8.1/0.4/4 input18 input HD-Audio Generic Line Out Front
/0/100/8.1/0.4/5 input19 input HD-Audio Generic Line Out Surround
/0/100/8.1/0.4/6 input20 input HD-Audio Generic Line Out CLFE
/0/100/8.1/0.4/7 input21 input HD-Audio Generic Front Headphone
/0/100/14 bus FCH SMBus Controller
/0/100/14.3 bridge FCH LPC Bridge
/0/100/14.3/0 system PnP device PNP0c01
/0/100/14.3/1 system PnP device PNP0c02
/0/100/14.3/2 system PnP device PNP0b00
/0/100/14.3/3 system PnP device PNP0c02
/0/100/14.3/4 system PnP device PNP0c02
/0/101 bridge Starship/Matisse PCIe Dummy Host Bridge
/0/102 bridge Starship/Matisse PCIe Dummy Host Bridge
/0/103 bridge Starship/Matisse PCIe Dummy Host Bridge
/0/104 bridge Starship/Matisse PCIe Dummy Host Bridge
/0/105 bridge Starship/Matisse PCIe Dummy Host Bridge
/0/106 bridge Starship/Matisse PCIe Dummy Host Bridge
/0/107 bridge Starship/Matisse PCIe Dummy Host Bridge
/0/108 bridge Matisse/Vermeer Data Fabric: Device 18h; Function 0
/0/109 bridge Matisse/Vermeer Data Fabric: Device 18h; Function 1
/0/10a bridge Matisse/Vermeer Data Fabric: Device 18h; Function 2
/0/10b bridge Matisse/Vermeer Data Fabric: Device 18h; Function 3
/0/10c bridge Matisse/Vermeer Data Fabric: Device 18h; Function 4
/0/10d bridge Matisse/Vermeer Data Fabric: Device 18h; Function 5
/0/10e bridge Matisse/Vermeer Data Fabric: Device 18h; Function 6
/0/10f bridge Matisse/Vermeer Data Fabric: Device 18h; Function 7
/1 input7 input Power Button
/2 input8 input Power Button
/3 input9 input PC Speaker
root@ikea:~#
Discord username
mcon
I have the same issue with my rx 6700xt on Arch.
[180412:0202/233234.609809:ERROR:gl_surface_presentation_helper.cc(260)] GetVSyncParametersIfAvailable() failed for 1 times!
[180412:0202/233242.280897:ERROR:gl_surface_presentation_helper.cc(260)] GetVSyncParametersIfAvailable() failed for 2 times!
[180412:0202/233242.281545:ERROR:gl_surface_presentation_helper.cc(260)] GetVSyncParametersIfAvailable() failed for 3 times!
Starting up...
Started Invoke process with PID: 180577
amdgpu.ids: No such file or directory
Could not load bitsandbytes native library: 'NoneType' object has no attribute 'split'
Traceback (most recent call last):
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/bitsandbytes/cextension.py", line 85, in <module>
lib = get_native_library()
^^^^^^^^^^^^^^^^^^^^
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/bitsandbytes/cextension.py", line 64, in get_native_library
cuda_specs = get_cuda_specs()
^^^^^^^^^^^^^^^^
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/bitsandbytes/cuda_specs.py", line 39, in get_cuda_specs
cuda_version_string=(get_cuda_version_string()),
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/bitsandbytes/cuda_specs.py", line 29, in get_cuda_version_string
major, minor = get_cuda_version_tuple()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/bitsandbytes/cuda_specs.py", line 24, in get_cuda_version_tuple
major, minor = map(int, torch.version.cuda.split("."))
^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'split'
CUDA Setup failed despite CUDA being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/bitsandbytes-foundation/bitsandbytes/issues
>> patchmatch.patch_match: ERROR - patchmatch failed to load or compile (libvtkFiltersTexture.so.1: cannot open shared object file: No such file or directory).
>> patchmatch.patch_match: INFO - Refer to https://invoke-ai.github.io/InvokeAI/installation/060_INSTALL_PATCHMATCH/ for installation instructions.
[2025-02-02 23:33:15,760]::[InvokeAI]::INFO --> Patchmatch not loaded (nonfatal)
[2025-02-02 23:33:16,528]::[InvokeAI]::INFO --> Using torch device: AMD Radeon Graphics
[2025-02-02 23:33:16,665]::[InvokeAI]::INFO --> cuDNN version: 3001000
[2025-02-02 23:33:16,784]::[InvokeAI]::INFO --> InvokeAI version 5.6.0
[2025-02-02 23:33:16,784]::[InvokeAI]::INFO --> Root directory = /run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI
[2025-02-02 23:33:16,785]::[InvokeAI]::INFO --> Initializing database at /run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/databases/invokeai.db
[2025-02-02 23:33:16,818]::[ModelManagerService]::INFO --> [MODEL CACHE] Calculated model RAM cache size: 9200.00 MB. Heuristics applied: [1, 3].
[2025-02-02 23:33:16,905]::[InvokeAI]::INFO --> Pruned 1 finished queue items
[2025-02-02 23:33:19,957]::[InvokeAI]::INFO --> Cleaned database (freed 0.04MB)
[2025-02-02 23:33:19,957]::[InvokeAI]::INFO --> Invoke running on http://127.0.0.1:9090 (Press CTRL+C to quit)
[2025-02-02 23:33:19,961]::[InvokeAI]::INFO --> Executing queue item 2, session 57837bd5-451a-4b7d-98cf-77af221ee952
[2025-02-02 23:33:57,539]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '907a4c90-54e0-467d-9346-879f2c70d47a:unet' (UNet2DConditionModel) onto cuda device in 32.53s. Total model size: 4897.05MB, VRAM: 4897.05MB (100.0%)
[2025-02-02 23:33:57,924]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '907a4c90-54e0-467d-9346-879f2c70d47a:scheduler' (DDPMScheduler) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)
[2025-02-02 23:33:58,448]::[InvokeAI]::ERROR --> Error while invoking session 57837bd5-451a-4b7d-98cf-77af221ee952, invocation d372c6e3-d7e1-4f1f-8f27-3a277ceba8a6 (denoise_latents): HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
[2025-02-02 23:33:58,448]::[InvokeAI]::ERROR --> Traceback (most recent call last):
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/services/session_processor/session_processor_default.py", line 129, in run_node
output = invocation.invoke_internal(context=context, services=self._services)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/baseinvocation.py", line 300, in invoke_internal
output = self.invoke(context)
^^^^^^^^^^^^^^^^^^^^
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/denoise_latents.py", line 824, in invoke
return self._old_invoke(context)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/itachi/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/denoise_latents.py", line 1078, in _old_invoke
timesteps, init_timestep, scheduler_step_kwargs = self.init_scheduler(
^^^^^^^^^^^^^^^^^^^^
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/denoise_latents.py", line 729, in init_scheduler
t_start_idx = len(list(filter(lambda ts: ts >= t_start_val, _timesteps)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/denoise_latents.py", line 729, in <lambda>
t_start_idx = len(list(filter(lambda ts: ts >= t_start_val, _timesteps)))
^^^^^^^^^^^^^^^^^
RuntimeError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
.............
[2025-02-02 23:35:12,417]::[InvokeAI]::INFO --> Executing queue item 5, session a3cea2be-230e-47a3-a75b-07fd01150a82
[2025-02-02 23:35:12,447]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '907a4c90-54e0-467d-9346-879f2c70d47a:unet' (UNet2DConditionModel) onto cuda device in 0.00s. Total model size: 4897.05MB, VRAM: 4897.05MB (100.0%)
[2025-02-02 23:35:12,449]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '907a4c90-54e0-467d-9346-879f2c70d47a:scheduler' (DDPMScheduler) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)
[2025-02-02 23:35:12,459]::[InvokeAI]::ERROR --> Error while invoking session a3cea2be-230e-47a3-a75b-07fd01150a82, invocation 2dfa2473-3dca-46d9-a2be-288795f10772 (denoise_latents): HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
[2025-02-02 23:35:12,459]::[InvokeAI]::ERROR --> Traceback (most recent call last):
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/services/session_processor/session_processor_default.py", line 129, in run_node
output = invocation.invoke_internal(context=context, services=self._services)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/baseinvocation.py", line 300, in invoke_internal
output = self.invoke(context)
^^^^^^^^^^^^^^^^^^^^
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/denoise_latents.py", line 824, in invoke
return self._old_invoke(context)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/itachi/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/denoise_latents.py", line 1078, in _old_invoke
timesteps, init_timestep, scheduler_step_kwargs = self.init_scheduler(
^^^^^^^^^^^^^^^^^^^^
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/denoise_latents.py", line 729, in init_scheduler
t_start_idx = len(list(filter(lambda ts: ts >= t_start_val, _timesteps)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/denoise_latents.py", line 729, in <lambda>
t_start_idx = len(list(filter(lambda ts: ts >= t_start_val, _timesteps)))
^^^^^^^^^^^^^^^^^
RuntimeError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
[2025-02-02 23:35:12,818]::[InvokeAI]::INFO --> Graph stats: a3cea2be-230e-47a3-a75b-07fd01150a82
Node Calls Seconds VRAM Used
sdxl_model_loader 1 0.000s 4.881G
sdxl_compel_prompt 2 0.001s 4.881G
collect 2 0.001s 4.881G
noise 1 0.016s 4.881G
denoise_latents 1 0.015s 4.882G
TOTAL GRAPH EXECUTION TIME: 0.032s
TOTAL GRAPH WALL TIME: 0.035s
RAM used by InvokeAI process: 5.91G (+0.000G)
RAM used to load models: 4.78G
VRAM in use: 4.881G
RAM cache statistics:
Model cache hits: 2
Model cache misses: 0
Models cached: 4
Models cleared from cache: 0
Cache high water mark: 6.31/0.00G
Same exact issue also but with a RX 6900XT...
Same exact issue also but with a RX 6900XT...
I solved (somehow) my problem installing InvokeAI and THEN:
- removing torch, torchvision and bitsandbytes
- installing the three (plus pytorch-triton-rocm) from Pytorch site.
This is my full start scrip (adjust for your GPU)t:
#!/bin/bash
set -x -e
script_path=$(readlink -f "$0" 2>/dev/null || realpath "$0" 2>/dev/null || echo "$0")
sdir="$(dirname "${script_path}")"
here="$(cd "$sdir" && pwd)"
echo "The path of this script is: $script_path ($here)"
user=$(ls -ld "$script_path" | awk '{print $3}')
home=$(getent passwd "$user" | cut -d: -f6)
echo "Home directory of $user is $home"
VENV="invoke"
# Check InvokeAI is instaled in virtual environment
if [ -x "$VENV/bin/invokeai-web" ]
then
echo "InvokeAI is already instaled, skipping..."
else
# check Virtual Environment exists
if [ -x "$VENV/bin/python" ]
then
echo "Virtual Environment at '$VENV' already present, skipping..."
else
echo "Creating basic Virtual Environment at '$VENV'..."
PYTHON="python3.11"
CACHE="$here"
# prepare environment
$PYTHON -m venv $VENV
fi
# Activate virtual environment
source "$VENV/bin/activate"
# Install InvokeAI in Virtual Environment
echo "Installing InvokeAI in Virtual Environment at '$VENV'..."
REPO=https://download.pytorch.org/whl/nightly/rocm6.3
$VENV/bin/pip install --extra-index-url $REPO invokeai
# restore right version of pytorch-triton-rocm, torch and torchvision
pip uninstall pytorch-triton-rocm torch torchvision bitsandbytes --yes
pip install pytorch-triton-rocm torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm6.3
# install multi-backend "bitsandbytes"
if [ -d "$here/bitsandbytes" ]
then
echo "Multi-backend 'bitsandbytes' already present, skipping..."
else
echo "Compiling Multi-backend 'bitsandbytes'..."
(
cd "$here"
# Install bitsandbytes from source
# Clone bitsandbytes repo, ROCm backend is currently enabled on multi-backend-refactor branch
git clone -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/
# Install dependencies
pip install .[dev]
# Compile & install
#sudo apt-get install -y build-essential cmake # install build tools dependencies, unless present
cmake -DCOMPUTE_BACKEND=hip -S . # Use -DBNB_ROCM_ARCH="gfx90a;gfx942" to target specific gpu arch
make
)
fi
echo "Installing Multi-backend 'bitsandbytes'..."
pip install "$here/bitsandbytes" # `-e` for "editable" install, when developing BNB (otherwise leave that out)
fi
# start InvokeAI
export PYTORCH_ROCM_ARCH=gfx1102
export HSA_OVERRIDE_GFX_VERSION=11.0.0
export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True
export TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
export INVOKEAI_ROOT=~/invokeai
export GPU_DRIVER=rocm
$VENV/bin/invokeai-web
@mcondarelli , are you able to use all the features in Invoke?
@mcondarelli , are you able to use all the features in Invoke?
I am very new to InvokeAI so I have NO idea about "all the features", but I can do a lot of things with no errors, at least:
- generate images from prompt SD1.x, SDXL and FLUX
- do simple image to image
- use and modify workflows
- train simple SD1.5 LoRA
I didn't try upscaling, yet
Things surely not working:
- train SDXL LoRA
I opened a few tickets against ROCm and bitsandbytes so not "everything is working".
If you need more info you should be more specific.
I am fully willing to make tests on my setup and share results.
The official installer, for some reason, installs a version of bitsandbytes that doesn't support ROCm as a backend. I've been swapping it out for ROCm's fork of bitsandbytes, which of course does. But, since I built it myself and my distro is on ROCm 6.3, I then have to switch torch, torchvision, and pytorch-triton-rocm to the version compatible with ROCm 6.3. Basically, the same thing mcondarelli is doing. Haven't figured out how to get patchmatch working with it. Hope this gets fixed soon.
I solved (somehow) my problem installing InvokeAI and THEN: * removing torch, torchvision and bitsandbytes * installing the three (plus pytorch-triton-rocm) from Pytorch site.
Thank you, it worked for me :)
After doing a repair upgrade to Invoke v5.10.1 using launcher v1.5.0, my AMD RX 6800 is now being used without installing custom versions of anything. Despite this, there are still bitsandbytes errors on launch.
Starting up...
Started Invoke process with PID: 67175
amdgpu.ids: No such file or directory
[2025-04-25 16:17:10,549]::[InvokeAI]::INFO --> PyTorch CUDA memory allocator: native
[2025-04-25 16:17:10,552]::[InvokeAI]::INFO --> Using torch device: AMD Radeon Graphics
Could not load bitsandbytes native library: 'NoneType' object has no attribute 'split'
Traceback (most recent call last):
File "/InvokeAI/.venv/lib/python3.12/site-packages/bitsandbytes/cextension.py", line 85, in <module>
lib = get_native_library()
^^^^^^^^^^^^^^^^^^^^
File "/InvokeAI/.venv/lib/python3.12/site-packages/bitsandbytes/cextension.py", line 64, in get_native_library
cuda_specs = get_cuda_specs()
^^^^^^^^^^^^^^^^
File "/InvokeAI/.venv/lib/python3.12/site-packages/bitsandbytes/cuda_specs.py", line 39, in get_cuda_specs
cuda_version_string=(get_cuda_version_string()),
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/InvokeAI/.venv/lib/python3.12/site-packages/bitsandbytes/cuda_specs.py", line 29, in get_cuda_version_string
major, minor = get_cuda_version_tuple()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/InvokeAI/.venv/lib/python3.12/site-packages/bitsandbytes/cuda_specs.py", line 24, in get_cuda_version_tuple
major, minor = map(int, torch.version.cuda.split("."))
^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'split'
CUDA Setup failed despite CUDA being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes and open an issue at: https://github.com/bitsandbytes-foundation/bitsandbytes/issues
[2025-04-25 16:17:11,712]::[InvokeAI]::INFO --> cuDNN version: 3002000
>> patchmatch.patch_match: ERROR - patchmatch failed to load or compile (/usr/lib64/libtiff.so.6: undefined symbol: jpeg12_write_raw_data, version LIBJPEG_8.0).
>> patchmatch.patch_match: INFO - Refer to https://invoke-ai.github.io/InvokeAI/installation/060_INSTALL_PATCHMATCH/ for installation instructions.
[2025-04-25 16:17:12,779]::[InvokeAI]::INFO --> Patchmatch not loaded (nonfatal)
[2025-04-25 16:17:13,075]::[InvokeAI]::INFO --> Loading node pack clothing-mask-node
[2025-04-25 16:17:13,077]::[InvokeAI]::INFO --> Loading node pack simple-skin-detection-node
[2025-04-25 16:17:13,079]::[InvokeAI]::INFO --> Loading node pack adapters-linked-nodes
[2025-04-25 16:17:13,088]::[InvokeAI]::INFO --> Loaded 3 node packs from /InvokeAI/nodes: clothing-mask-node, simple-skin-detection-node, adapters-linked-nodes
[2025-04-25 16:17:13,096]::[InvokeAI]::INFO --> InvokeAI version 5.10.1
[2025-04-25 16:17:13,096]::[InvokeAI]::INFO --> Root directory = /InvokeAI
[2025-04-25 16:17:13,097]::[InvokeAI]::INFO --> Initializing database at /InvokeAI/databases/invokeai.db
[2025-04-25 16:17:13,098]::[ModelManagerService]::INFO --> [MODEL CACHE] Calculated model RAM cache size: 12272.00 MB. Heuristics applied: [1, 2].
[2025-04-25 16:17:13,151]::[InvokeAI]::INFO --> Invoke running on http://127.0.0.1:9090 (Press CTRL+C to quit)
/InvokeAI/.venv/lib/python3.12/site-packages/huggingface_hub/utils/_deprecation.py:131: FutureWarning: 'get_token_permission' (from 'huggingface_hub.hf_api') is deprecated and will be removed from version '1.0'. Permissions are more complex than when `get_token_permission` was first introduced. OAuth and fine-grain tokens allows for more detailed permissions. If you need to know the permissions associated with a token, please use `whoami` and check the `'auth'` key.
warnings.warn(warning_message, FutureWarning)
Thanks for making InvokeAI. It's great. But things could really be a bit easier for new users IMHO:
- The script from @mcondarelli should be in the default installation IMHO. Without it, nothing really works.
- There should be a big warning that the AppImage won't work if you have a Radeon CPU and that you need to install the app manually. I wasted a lot of time not realizing that it just wont work otherwise
- There should also be a step "Software requirements" next to "Hardware requirments" on the installation page saying that you need to install rocm (rocm-hip-sdk) if you have a Radeon CPU.
- I had to run "invoke/bin/pip install torch torchvision" before the script worked btw