InvokeAI [bug]: Invoke refuses to use my RX 7600 XT GPU

Is there an existing issue for this problem?

[x] I have searched the existing issues

Operating system

Linux

GPU vendor

AMD (ROCm)

GPU model

RX 7600 XT

GPU VRAM

16GB

Version number

5.5.0

Browser

Firefox 134.0

Python dependencies

{
  "accelerate": "1.0.1",
  "compel": "2.0.2",
  "cuda": null,
  "diffusers": "0.31.0",
  "numpy": "1.26.3",
  "opencv": "4.9.0.80",
  "onnx": "1.16.1",
  "pillow": "10.2.0",
  "python": "3.11.11",
  "torch": "2.4.1+rocm6.1",
  "torchvision": "0.19.1+rocm6.1",
  "transformers": "4.46.3",
  "xformers": null
}

What happened

Every time I try to generate an image I get error:

Server Error
RuntimeError: HIP error: invalid device function HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3 Compile with `TORCH_USE_HIP_DSA` to...

What you expected to happen

I expected image generation to start.

How to reproduce the problem

In my setup all image generation attempts produce this error. Using a CPU-only, no-GPU configuration works as expected... and, as expected, is very slow.

Additional context

I have seen several bug reports mentioning ROCm, but I didn't find anything really comparable. Notice I'm a completely newbie at AI hosting so I might be missing something pretty basic.

Full specs of my server are:

root@ikea:~# lshw -short
H/W path              Device          Class          Description
================================================================
                                      system         MS-7C91 (To be filled by O.E.M.)
/0                                    bus            MPG B550 GAMING EDGE WIFI (MS-7C91)
/0/0                                  memory         64KiB BIOS
/0/10                                 memory         32GiB System Memory
/0/10/0                               memory         2667 MHz (0.4 ns) [empty]
/0/10/1                               memory         16GiB DIMM DDR4 Synchronous Unbuffered (Unregistered) 2667 MHz (0.4 ns)
/0/10/2                               memory         2667 MHz (0.4 ns) [empty]
/0/10/3                               memory         16GiB DIMM DDR4 Synchronous Unbuffered (Unregistered) 2667 MHz (0.4 ns)
/0/13                                 memory         1MiB L1 cache
/0/14                                 memory         8MiB L2 cache
/0/15                                 memory         64MiB L3 cache
/0/16                                 processor      AMD Ryzen 9 5950X 16-Core Processor
/0/100                                bridge         Starship/Matisse Root Complex
/0/100/0.2                            generic        Starship/Matisse IOMMU
/0/100/1.1                            bridge         Starship/Matisse GPP Bridge
/0/100/1.1/0          /dev/nvme0      storage        CT2000P2SSD8
/0/100/1.1/0/0        hwmon0          disk           NVMe disk
/0/100/1.1/0/2        /dev/ng0n1      disk           NVMe disk
/0/100/1.1/0/1        /dev/nvme0n1    disk           2TB NVMe disk
/0/100/1.1/0/1/1      /dev/nvme0n1p1  volume         511MiB Windows FAT volume
/0/100/1.1/0/1/2      /dev/nvme0n1p2  volume         201GiB EXT4 volume
/0/100/1.1/0/1/3      /dev/nvme0n1p3  volume         1023MiB Linux swap volume
/0/100/1.1/0/1/4      /dev/nvme0n1p4  volume         1660GiB EXT4 volume
/0/100/1.2                            bridge         Starship/Matisse GPP Bridge
/0/100/1.2/0                          bus            500 Series Chipset USB 3.1 XHCI Controller
/0/100/1.2/0/0        usb1            bus            xHCI Host Controller
/0/100/1.2/0/0/2                      bus            USB2.0 Hub
/0/100/1.2/0/0/8      input6          input          MSI MYSTIC LIGHT
/0/100/1.2/0/0/9                      communication  AX200 Bluetooth
/0/100/1.2/0/1        usb2            bus            xHCI Host Controller
/0/100/1.2/0.1                        storage        500 Series Chipset SATA Controller
/0/100/1.2/0.2                        bridge         500 Series Chipset Switch Upstream Port
/0/100/1.2/0.2/8                      bridge         Advanced Micro Devices, Inc. [AMD]
/0/100/1.2/0.2/8/0    wlo1            network        Wi-Fi 6 AX200
/0/100/1.2/0.2/9                      bridge         Advanced Micro Devices, Inc. [AMD]
/0/100/1.2/0.2/9/0    enp42s0         network        RTL8125 2.5GbE Controller
/0/100/3.1                            bridge         Starship/Matisse GPP Bridge
/0/100/3.1/0                          bridge         Navi 10 XL Upstream Port of PCI Express Switch
/0/100/3.1/0/0        /dev/fb0        bridge         Navi 10 XL Downstream Port of PCI Express Switch
/0/100/3.1/0/0/0      /dev/fb0        display        Navi 33 [Radeon RX 7600/7600 XT/7600M XT/7600S/7700S / PRO W7600]
/0/100/3.1/0/0/0.1    card0           multimedia     Navi 31 HDMI/DP Audio
/0/100/3.1/0/0/0.1/0  input10         input          HDA ATI HDMI HDMI/DP,pcm=3
/0/100/3.1/0/0/0.1/1  input11         input          HDA ATI HDMI HDMI/DP,pcm=7
/0/100/3.1/0/0/0.1/2  input12         input          HDA ATI HDMI HDMI/DP,pcm=8
/0/100/3.1/0/0/0.1/3  input13         input          HDA ATI HDMI HDMI/DP,pcm=9
/0/100/7.1                            bridge         Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
/0/100/7.1/0                          generic        Starship/Matisse PCIe Dummy Function
/0/100/8.1                            bridge         Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B]
/0/100/8.1/0                          generic        Starship/Matisse Reserved SPP
/0/100/8.1/0.1                        generic        Starship/Matisse Cryptographic Coprocessor PSPCPP
/0/100/8.1/0.3                        bus            Matisse USB 3.0 Host Controller
/0/100/8.1/0.3/0      usb3            bus            xHCI Host Controller
/0/100/8.1/0.3/0/1    input0          input          CX 2.4G Receiver System Control
/0/100/8.1/0.3/1      usb4            bus            xHCI Host Controller
/0/100/8.1/0.4        card1           multimedia     Starship/Matisse HD Audio Controller
/0/100/8.1/0.4/0      input14         input          HDA Digital PCBeep
/0/100/8.1/0.4/1      input15         input          HD-Audio Generic Rear Mic
/0/100/8.1/0.4/2      input16         input          HD-Audio Generic Front Mic
/0/100/8.1/0.4/3      input17         input          HD-Audio Generic Line
/0/100/8.1/0.4/4      input18         input          HD-Audio Generic Line Out Front
/0/100/8.1/0.4/5      input19         input          HD-Audio Generic Line Out Surround
/0/100/8.1/0.4/6      input20         input          HD-Audio Generic Line Out CLFE
/0/100/8.1/0.4/7      input21         input          HD-Audio Generic Front Headphone
/0/100/14                             bus            FCH SMBus Controller
/0/100/14.3                           bridge         FCH LPC Bridge
/0/100/14.3/0                         system         PnP device PNP0c01
/0/100/14.3/1                         system         PnP device PNP0c02
/0/100/14.3/2                         system         PnP device PNP0b00
/0/100/14.3/3                         system         PnP device PNP0c02
/0/100/14.3/4                         system         PnP device PNP0c02
/0/101                                bridge         Starship/Matisse PCIe Dummy Host Bridge
/0/102                                bridge         Starship/Matisse PCIe Dummy Host Bridge
/0/103                                bridge         Starship/Matisse PCIe Dummy Host Bridge
/0/104                                bridge         Starship/Matisse PCIe Dummy Host Bridge
/0/105                                bridge         Starship/Matisse PCIe Dummy Host Bridge
/0/106                                bridge         Starship/Matisse PCIe Dummy Host Bridge
/0/107                                bridge         Starship/Matisse PCIe Dummy Host Bridge
/0/108                                bridge         Matisse/Vermeer Data Fabric: Device 18h; Function 0
/0/109                                bridge         Matisse/Vermeer Data Fabric: Device 18h; Function 1
/0/10a                                bridge         Matisse/Vermeer Data Fabric: Device 18h; Function 2
/0/10b                                bridge         Matisse/Vermeer Data Fabric: Device 18h; Function 3
/0/10c                                bridge         Matisse/Vermeer Data Fabric: Device 18h; Function 4
/0/10d                                bridge         Matisse/Vermeer Data Fabric: Device 18h; Function 5
/0/10e                                bridge         Matisse/Vermeer Data Fabric: Device 18h; Function 6
/0/10f                                bridge         Matisse/Vermeer Data Fabric: Device 18h; Function 7
/1                    input7          input          Power Button
/2                    input8          input          Power Button
/3                    input9          input          PC Speaker
root@ikea:~#

Discord username

mcon

Jan 20 '25 11:01 mcondarelli

I have the same issue with my rx 6700xt on Arch.

[180412:0202/233234.609809:ERROR:gl_surface_presentation_helper.cc(260)] GetVSyncParametersIfAvailable() failed for 1 times!
[180412:0202/233242.280897:ERROR:gl_surface_presentation_helper.cc(260)] GetVSyncParametersIfAvailable() failed for 2 times!
[180412:0202/233242.281545:ERROR:gl_surface_presentation_helper.cc(260)] GetVSyncParametersIfAvailable() failed for 3 times!
Starting up...

Started Invoke process with PID: 180577

amdgpu.ids: No such file or directory

Could not load bitsandbytes native library: 'NoneType' object has no attribute 'split'
Traceback (most recent call last):
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/bitsandbytes/cextension.py", line 85, in <module>
    lib = get_native_library()
          ^^^^^^^^^^^^^^^^^^^^
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/bitsandbytes/cextension.py", line 64, in get_native_library
    cuda_specs = get_cuda_specs()
                 ^^^^^^^^^^^^^^^^
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/bitsandbytes/cuda_specs.py", line 39, in get_cuda_specs
    cuda_version_string=(get_cuda_version_string()),
                         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/bitsandbytes/cuda_specs.py", line 29, in get_cuda_version_string
    major, minor = get_cuda_version_tuple()
                   ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/bitsandbytes/cuda_specs.py", line 24, in get_cuda_version_tuple
    major, minor = map(int, torch.version.cuda.split("."))
                            ^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'split'

CUDA Setup failed despite CUDA being available. Please run the following command to get more information:

python -m bitsandbytes

Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/bitsandbytes-foundation/bitsandbytes/issues


>> patchmatch.patch_match: ERROR - patchmatch failed to load or compile (libvtkFiltersTexture.so.1: cannot open shared object file: No such file or directory).
>> patchmatch.patch_match: INFO - Refer to https://invoke-ai.github.io/InvokeAI/installation/060_INSTALL_PATCHMATCH/ for installation instructions.

[2025-02-02 23:33:15,760]::[InvokeAI]::INFO --> Patchmatch not loaded (nonfatal)

[2025-02-02 23:33:16,528]::[InvokeAI]::INFO --> Using torch device: AMD Radeon Graphics

[2025-02-02 23:33:16,665]::[InvokeAI]::INFO --> cuDNN version: 3001000

[2025-02-02 23:33:16,784]::[InvokeAI]::INFO --> InvokeAI version 5.6.0
[2025-02-02 23:33:16,784]::[InvokeAI]::INFO --> Root directory = /run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI

[2025-02-02 23:33:16,785]::[InvokeAI]::INFO --> Initializing database at /run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/databases/invokeai.db

[2025-02-02 23:33:16,818]::[ModelManagerService]::INFO --> [MODEL CACHE] Calculated model RAM cache size: 9200.00 MB. Heuristics applied: [1, 3].

[2025-02-02 23:33:16,905]::[InvokeAI]::INFO --> Pruned 1 finished queue items

[2025-02-02 23:33:19,957]::[InvokeAI]::INFO --> Cleaned database (freed 0.04MB)
[2025-02-02 23:33:19,957]::[InvokeAI]::INFO --> Invoke running on http://127.0.0.1:9090 (Press CTRL+C to quit)

[2025-02-02 23:33:19,961]::[InvokeAI]::INFO --> Executing queue item 2, session 57837bd5-451a-4b7d-98cf-77af221ee952

[2025-02-02 23:33:57,539]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '907a4c90-54e0-467d-9346-879f2c70d47a:unet' (UNet2DConditionModel) onto cuda device in 32.53s. Total model size: 4897.05MB, VRAM: 4897.05MB (100.0%)

[2025-02-02 23:33:57,924]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '907a4c90-54e0-467d-9346-879f2c70d47a:scheduler' (DDPMScheduler) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)

[2025-02-02 23:33:58,448]::[InvokeAI]::ERROR --> Error while invoking session 57837bd5-451a-4b7d-98cf-77af221ee952, invocation d372c6e3-d7e1-4f1f-8f27-3a277ceba8a6 (denoise_latents): HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

[2025-02-02 23:33:58,448]::[InvokeAI]::ERROR --> Traceback (most recent call last):
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/services/session_processor/session_processor_default.py", line 129, in run_node
    output = invocation.invoke_internal(context=context, services=self._services)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/baseinvocation.py", line 300, in invoke_internal
    output = self.invoke(context)
             ^^^^^^^^^^^^^^^^^^^^
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/denoise_latents.py", line 824, in invoke
    return self._old_invoke(context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/itachi/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/denoise_latents.py", line 1078, in _old_invoke
    timesteps, init_timestep, scheduler_step_kwargs = self.init_scheduler(
                                                      ^^^^^^^^^^^^^^^^^^^^
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/denoise_latents.py", line 729, in init_scheduler
    t_start_idx = len(list(filter(lambda ts: ts >= t_start_val, _timesteps)))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/denoise_latents.py", line 729, in <lambda>
    t_start_idx = len(list(filter(lambda ts: ts >= t_start_val, _timesteps)))
                                             ^^^^^^^^^^^^^^^^^
RuntimeError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

.............


[2025-02-02 23:35:12,417]::[InvokeAI]::INFO --> Executing queue item 5, session a3cea2be-230e-47a3-a75b-07fd01150a82

[2025-02-02 23:35:12,447]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '907a4c90-54e0-467d-9346-879f2c70d47a:unet' (UNet2DConditionModel) onto cuda device in 0.00s. Total model size: 4897.05MB, VRAM: 4897.05MB (100.0%)

[2025-02-02 23:35:12,449]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '907a4c90-54e0-467d-9346-879f2c70d47a:scheduler' (DDPMScheduler) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)

[2025-02-02 23:35:12,459]::[InvokeAI]::ERROR --> Error while invoking session a3cea2be-230e-47a3-a75b-07fd01150a82, invocation 2dfa2473-3dca-46d9-a2be-288795f10772 (denoise_latents): HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.

[2025-02-02 23:35:12,459]::[InvokeAI]::ERROR --> Traceback (most recent call last):
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/services/session_processor/session_processor_default.py", line 129, in run_node
    output = invocation.invoke_internal(context=context, services=self._services)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/baseinvocation.py", line 300, in invoke_internal
    output = self.invoke(context)
             ^^^^^^^^^^^^^^^^^^^^
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/denoise_latents.py", line 824, in invoke
    return self._old_invoke(context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/itachi/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/denoise_latents.py", line 1078, in _old_invoke
    timesteps, init_timestep, scheduler_step_kwargs = self.init_scheduler(
                                                      ^^^^^^^^^^^^^^^^^^^^
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/denoise_latents.py", line 729, in init_scheduler
    t_start_idx = len(list(filter(lambda ts: ts >= t_start_val, _timesteps)))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/media/itachi/DATA_SATA_4TB/SD2/InvokeAI/.venv/lib/python3.11/site-packages/invokeai/app/invocations/denoise_latents.py", line 729, in <lambda>
    t_start_idx = len(list(filter(lambda ts: ts >= t_start_val, _timesteps)))
                                             ^^^^^^^^^^^^^^^^^
RuntimeError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing AMD_SERIALIZE_KERNEL=3
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.



[2025-02-02 23:35:12,818]::[InvokeAI]::INFO --> Graph stats: a3cea2be-230e-47a3-a75b-07fd01150a82
                          Node   Calls   Seconds  VRAM Used
             sdxl_model_loader       1    0.000s     4.881G
            sdxl_compel_prompt       2    0.001s     4.881G
                       collect       2    0.001s     4.881G
                         noise       1    0.016s     4.881G
               denoise_latents       1    0.015s     4.882G
TOTAL GRAPH EXECUTION TIME:   0.032s
TOTAL GRAPH WALL TIME:   0.035s
RAM used by InvokeAI process: 5.91G (+0.000G)
RAM used to load models: 4.78G
VRAM in use: 4.881G
RAM cache statistics:
   Model cache hits: 2
   Model cache misses: 0
   Models cached: 4
   Models cleared from cache: 0
   Cache high water mark: 6.31/0.00G

Feb 02 '25 18:02 SherLock707

Same exact issue also but with a RX 6900XT...

Feb 18 '25 13:02 reversesh3ll

Same exact issue also but with a RX 6900XT...

I solved (somehow) my problem installing InvokeAI and THEN:

removing torch, torchvision and bitsandbytes
installing the three (plus pytorch-triton-rocm) from Pytorch site.

This is my full start scrip (adjust for your GPU)t:

#!/bin/bash
set -x -e

script_path=$(readlink -f "$0" 2>/dev/null || realpath "$0" 2>/dev/null || echo "$0")
sdir="$(dirname "${script_path}")"
here="$(cd "$sdir" && pwd)"
echo "The path of this script is: $script_path ($here)"
user=$(ls -ld "$script_path" | awk '{print $3}')
home=$(getent passwd "$user" | cut -d: -f6)
echo "Home directory of $user is $home"

VENV="invoke"

# Check InvokeAI is instaled in virtual environment
if [ -x "$VENV/bin/invokeai-web" ]
then
    echo "InvokeAI is already instaled, skipping..."
else
    # check Virtual Environment exists
    if [ -x "$VENV/bin/python" ]
    then
        echo "Virtual Environment at '$VENV' already present, skipping..."
    else
        echo "Creating basic Virtual Environment at '$VENV'..."

        PYTHON="python3.11"
        CACHE="$here"
        # prepare environment
        $PYTHON -m venv $VENV
    fi

    # Activate virtual environment
    source "$VENV/bin/activate"

    # Install InvokeAI in Virtual Environment
    echo "Installing InvokeAI in Virtual Environment at '$VENV'..."
    REPO=https://download.pytorch.org/whl/nightly/rocm6.3
    $VENV/bin/pip install --extra-index-url $REPO invokeai

    # restore right version of pytorch-triton-rocm, torch and torchvision
    pip uninstall pytorch-triton-rocm torch torchvision bitsandbytes --yes
    pip install pytorch-triton-rocm torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm6.3

    # install multi-backend "bitsandbytes"
    if [ -d "$here/bitsandbytes" ]
    then
        echo "Multi-backend 'bitsandbytes' already present, skipping..."
    else
        echo "Compiling Multi-backend 'bitsandbytes'..."
        (
            cd "$here"
            # Install bitsandbytes from source
            # Clone bitsandbytes repo, ROCm backend is currently enabled on multi-backend-refactor branch
            git clone -b multi-backend-refactor https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/

            # Install dependencies
            pip install .[dev]

            # Compile & install
            #sudo apt-get install -y build-essential cmake  # install build tools dependencies, unless present
            cmake -DCOMPUTE_BACKEND=hip -S .  # Use -DBNB_ROCM_ARCH="gfx90a;gfx942" to target specific gpu arch
            make
        )
    fi
    echo "Installing Multi-backend 'bitsandbytes'..."
    pip install "$here/bitsandbytes"  # `-e` for "editable" install, when developing BNB (otherwise leave that out)
fi

# start InvokeAI
export PYTORCH_ROCM_ARCH=gfx1102
export HSA_OVERRIDE_GFX_VERSION=11.0.0
export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True
export TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
export INVOKEAI_ROOT=~/invokeai
export GPU_DRIVER=rocm

$VENV/bin/invokeai-web

Feb 18 '25 13:02 mcondarelli

@mcondarelli , are you able to use all the features in Invoke?

Feb 20 '25 17:02 SherLock707

@mcondarelli , are you able to use all the features in Invoke?

I am very new to InvokeAI so I have NO idea about "all the features", but I can do a lot of things with no errors, at least:

generate images from prompt SD1.x, SDXL and FLUX
do simple image to image
use and modify workflows
train simple SD1.5 LoRA

I didn't try upscaling, yet

Things surely not working:

train SDXL LoRA

I opened a few tickets against ROCm and bitsandbytes so not "everything is working". If you need more info you should be more specific. I am fully willing to make tests on my setup and share results.

Feb 20 '25 18:02 mcondarelli

The official installer, for some reason, installs a version of bitsandbytes that doesn't support ROCm as a backend. I've been swapping it out for ROCm's fork of bitsandbytes, which of course does. But, since I built it myself and my distro is on ROCm 6.3, I then have to switch torch, torchvision, and pytorch-triton-rocm to the version compatible with ROCm 6.3. Basically, the same thing mcondarelli is doing. Haven't figured out how to get patchmatch working with it. Hope this gets fixed soon.

Mar 24 '25 03:03 Asherathe

I solved (somehow) my problem installing InvokeAI and THEN: * removing torch, torchvision and bitsandbytes * installing the three (plus pytorch-triton-rocm) from Pytorch site.

Thank you, it worked for me :)

Apr 09 '25 01:04 daxime

After doing a repair upgrade to Invoke v5.10.1 using launcher v1.5.0, my AMD RX 6800 is now being used without installing custom versions of anything. Despite this, there are still bitsandbytes errors on launch.

Starting up...
Started Invoke process with PID: 67175
amdgpu.ids: No such file or directory
[2025-04-25 16:17:10,549]::[InvokeAI]::INFO --> PyTorch CUDA memory allocator: native
[2025-04-25 16:17:10,552]::[InvokeAI]::INFO --> Using torch device: AMD Radeon Graphics
Could not load bitsandbytes native library: 'NoneType' object has no attribute 'split'
Traceback (most recent call last):
  File "/InvokeAI/.venv/lib/python3.12/site-packages/bitsandbytes/cextension.py", line 85, in <module>
    lib = get_native_library()
          ^^^^^^^^^^^^^^^^^^^^
  File "/InvokeAI/.venv/lib/python3.12/site-packages/bitsandbytes/cextension.py", line 64, in get_native_library
    cuda_specs = get_cuda_specs()
                 ^^^^^^^^^^^^^^^^
  File "/InvokeAI/.venv/lib/python3.12/site-packages/bitsandbytes/cuda_specs.py", line 39, in get_cuda_specs
    cuda_version_string=(get_cuda_version_string()),
                         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/InvokeAI/.venv/lib/python3.12/site-packages/bitsandbytes/cuda_specs.py", line 29, in get_cuda_version_string
    major, minor = get_cuda_version_tuple()
                   ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/InvokeAI/.venv/lib/python3.12/site-packages/bitsandbytes/cuda_specs.py", line 24, in get_cuda_version_tuple
    major, minor = map(int, torch.version.cuda.split("."))
                            ^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'split'
CUDA Setup failed despite CUDA being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes and open an issue at: https://github.com/bitsandbytes-foundation/bitsandbytes/issues
[2025-04-25 16:17:11,712]::[InvokeAI]::INFO --> cuDNN version: 3002000
>> patchmatch.patch_match: ERROR - patchmatch failed to load or compile (/usr/lib64/libtiff.so.6: undefined symbol: jpeg12_write_raw_data, version LIBJPEG_8.0).
>> patchmatch.patch_match: INFO - Refer to https://invoke-ai.github.io/InvokeAI/installation/060_INSTALL_PATCHMATCH/ for installation instructions.
[2025-04-25 16:17:12,779]::[InvokeAI]::INFO --> Patchmatch not loaded (nonfatal)
[2025-04-25 16:17:13,075]::[InvokeAI]::INFO --> Loading node pack clothing-mask-node
[2025-04-25 16:17:13,077]::[InvokeAI]::INFO --> Loading node pack simple-skin-detection-node
[2025-04-25 16:17:13,079]::[InvokeAI]::INFO --> Loading node pack adapters-linked-nodes
[2025-04-25 16:17:13,088]::[InvokeAI]::INFO --> Loaded 3 node packs from /InvokeAI/nodes: clothing-mask-node, simple-skin-detection-node, adapters-linked-nodes
[2025-04-25 16:17:13,096]::[InvokeAI]::INFO --> InvokeAI version 5.10.1
[2025-04-25 16:17:13,096]::[InvokeAI]::INFO --> Root directory = /InvokeAI
[2025-04-25 16:17:13,097]::[InvokeAI]::INFO --> Initializing database at /InvokeAI/databases/invokeai.db
[2025-04-25 16:17:13,098]::[ModelManagerService]::INFO --> [MODEL CACHE] Calculated model RAM cache size: 12272.00 MB. Heuristics applied: [1, 2].
[2025-04-25 16:17:13,151]::[InvokeAI]::INFO --> Invoke running on http://127.0.0.1:9090 (Press CTRL+C to quit)
/InvokeAI/.venv/lib/python3.12/site-packages/huggingface_hub/utils/_deprecation.py:131: FutureWarning: 'get_token_permission' (from 'huggingface_hub.hf_api') is deprecated and will be removed from version '1.0'. Permissions are more complex than when `get_token_permission` was first introduced. OAuth and fine-grain tokens allows for more detailed permissions. If you need to know the permissions associated with a token, please use `whoami` and check the `'auth'` key.
  warnings.warn(warning_message, FutureWarning)

Apr 25 '25 20:04 Asherathe

Thanks for making InvokeAI. It's great. But things could really be a bit easier for new users IMHO:

The script from @mcondarelli should be in the default installation IMHO. Without it, nothing really works.
There should be a big warning that the AppImage won't work if you have a Radeon CPU and that you need to install the app manually. I wasted a lot of time not realizing that it just wont work otherwise
There should also be a step "Software requirements" next to "Hardware requirments" on the installation page saying that you need to install rocm (rocm-hip-sdk) if you have a Radeon CPU.
I had to run "invoke/bin/pip install torch torchvision" before the script worked btw

Jun 30 '25 14:06 rgpublic