LocalAI AMD support is completly broken - no load is placed on GPU

LocalAI version:

Tested from 2.6.x to most current commit.

Environment, CPU architecture, OS, and Version: x64, Fedora 39

Describe the bug Successfully compiled LocalAI for both CLBlast and HipBlast for lama.cpp backend. DEBUG=true logs says layers are offloaded to GPU, but that's lie - GPU monitoring tools shows 0% load all the time. Placing other workloads on GPU shows load on GPU in monitoring tools, so that's not an issue with monitoring tools.

To Reproduce

Compile LocalAI for CLBlast or HipBlast.
Launch it on machine with AMD GPU, amdgpu driver and ROCm installed.
See that logs says "layers offloaded", but no load is placed on GPU. Only CPU is utilized.

Expected behavior

Load is place placed on GPU.

EDIT: Seems like bug only triggers when build only for single backend. I rebuild LocalAI for all backends and this time it works, despite using same backend as was build when building for single backend.

Considering that AMD GPUs gives cheaper access to bigger pool of VRAM it would be very beneficial to have it properly supported.

Feb 06 '24 07:02 Expro

@Expro did you set up gpu_layers in the model file? https://localai.io/features/gpu-acceleration/#model-configuration

Maybe we should just expose that option from the CLI, or we should default to a high number as if there is no GPU seems harmless.

Feb 07 '24 08:02 mudler

I did setup gpu_layers in model file, despite documentation stating in at least 2 places that gpu_layers is only used with cublast, so not for AMD. I swapped between build for single backend vs multiple backends without touching models and one of them ran on GPU, another one didn't/

Defaulting to high number of layers seems like good idea, so does exposure though environment variable and CLI.

Feb 07 '24 12:02 Expro

I'd also set threads to 0, and ensure your GPU layers is set to something like 100-120 in your case, to keep it off the CPU. I don't have an AMD GPU to reproduce this issue, so taking a bit of a stab in the dark.

Feb 08 '24 10:02 TheDarkTrumpet

This is the same sort of issue that I have had for a while, this issue is adjacent to https://github.com/mudler/LocalAI/issues/1592

Limited success building on metal - it will build on opensuse leap 15.4 and i can run llama.cpp directly, but there is some issue with localai <-> llama.cpp calls that prevents localai from working
still no success at all in docker on either deb12, ubu22.04 or opns15.4 - these installs were only ever performed manually
recent builds with arch work but fail to execute in the same manner as noted in this issue with 0 gpu usage

I will be rebuilding this workstation soon and may have time after next week to do some more build tests and debugging

Feb 14 '24 00:02 jamiemoller

This issue should be considered resolved, i have been continuously using my RadeonVII for at least 2 months now without issue

Jun 01 '24 08:06 jtwolfe

Right - closing then, and thanks @jtwolfe for confirming!

Jun 01 '24 13:06 mudler