Can't seem to get Tabby to run on GPU
Describe the bug
(This might be me doing something wrong, and not a bug!)
I can't seem to get Tabby to run using my GPU (Radeon 6600 XT). Not with ROC, (which I believe is unsupported for my device) nor with Vulkan — which I believe should be supported?
Whenever I run tabby (serve --model DeepseekCoder-1.3B --chat-model Qwen2-1.5B-Instruct --device vulkan) and go into the Web UI to ask something I check my CPU and GPU usage (using btop and amdgpu_top respectively) I see my CPU usage spiking and almost no effect on the GPU. (This is when running v0.15.0-r2 or compiling v0.15.0-r3 myself).
If I try to use v0.14.0 (same options) I instead get this:
2024-08-08T11:19:34.328360Z WARN llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:111: <embedding>: warning: see main README.md for information on enabling GPU BLAS support
Information about your version
See above.
Information about your GPU
From vulkaninfo:
Devices:
========
GPU0:
apiVersion = 1.3.270
driverVersion = 2.0.294
vendorID = 0x1002
deviceID = 0x73ff
deviceType = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
deviceName = AMD Radeon RX 6600 XT
driverID = DRIVER_ID_AMD_PROPRIETARY
driverName = AMD proprietary driver
driverInfo = (AMD proprietary shader compiler)
conformanceVersion = 1.3.3.1
deviceUUID = 00000000-2800-0000-0000-000000000000
driverUUID = 414d442d-4c49-4e55-582d-445256000000
GPU1:
apiVersion = 1.3.278
driverVersion = 24.1.5
vendorID = 0x1002
deviceID = 0x73ff
deviceType = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
deviceName = AMD Radeon RX 6600 XT (RADV NAVI23)
driverID = DRIVER_ID_MESA_RADV
driverName = radv
driverInfo = Mesa 24.1.5
conformanceVersion = 1.3.0.0
deviceUUID = 00000000-2800-0000-0000-000000000000
driverUUID = 414d442d-4d45-5341-2d44-525600000000
From rocminfo (if it helps):
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
Runtime Ext Version: 1.4
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 5 3600 6-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 5 3600 6-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3600
BDFID: 0
Internal Node ID: 0
Compute Unit: 12
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 32763224(0x1f3ed58) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32763224(0x1f3ed58) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32763224(0x1f3ed58) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1032
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 6600 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 2048(0x800) KB
L3: 32768(0x8000) KB
Chip ID: 29695(0x73ff)
ASIC Revision: 0(0x0)
Cacheline Size: 128(0x80)
Max Clock Freq. (MHz): 2900
BDFID: 10240
Internal Node ID: 1
Compute Unit: 32
SIMDs per CU: 2
Shader Engines: 2
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 118
SDMA engine uCode:: 76
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8372224(0x7fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 8372224(0x7fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1032
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Additional context
Full terminal output when starting Tabby (v0.15.0-r3, compiled myself):
✓ $ cargo run --features vulkan --release serve --model DeepseekCoder-1.3B --chat-model Qwen2-1.5B-Instruct --device vulkan --parallelism 2
warning: function `tracing_context` is never used
--> ee/tabby-webserver/src/hub.rs:15:4
|
15 | fn tracing_context() -> tarpc::context::Context {
| ^^^^^^^^^^^^^^^
|
= note: `#[warn(dead_code)]` on by default
warning: `tabby-webserver` (lib) generated 1 warning
warning: function `chat_completions_utoipa` is never used
--> crates/tabby/src/routes/chat.rs:29:14
|
29 | pub async fn chat_completions_utoipa(_request: Json<serde_json::Value>) -> Statu...
| ^^^^^^^^^^^^^^^^^^^^^^^
|
= note: `#[warn(dead_code)]` on by default
warning: `tabby` (bin "tabby") generated 1 warning
Finished release [optimized] target(s) in 0.31s
Running `target/release/tabby serve --model DeepseekCoder-1.3B --chat-model Qwen2-1.5B-Instruct --device vulkan --parallelism 2`
2024-08-08T11:27:47.846355Z DEBUG tabby_common::config: crates/tabby-common/src/config.rs:35: Config file /home/USER/.tabby/config.toml not found, apply default configuration
2024-08-08T11:27:48.439867Z DEBUG tabby::serve: crates/tabby/src/serve.rs:411: Starting server, this might take a few minutes...
2024-08-08T11:27:48.494017Z DEBUG llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:124: Waiting for llama-server <embedding> to start...
2024-08-08T11:27:48.634321Z DEBUG llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:132: llama-server <embedding> started successfully
2024-08-08T11:27:48.649300Z DEBUG tabby_common::config: crates/tabby-common/src/config.rs:35: Config file /home/USER/.tabby/config.toml not found, apply default configuration
2024-08-08T11:27:48.649713Z DEBUG tabby::services::tantivy: crates/tabby/src/services/tantivy.rs:33: Index is ready, enabling search...
2024-08-08T11:27:48.948439Z DEBUG llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:124: Waiting for llama-server <chat> to start...
2024-08-08T11:27:49.463618Z DEBUG llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:132: llama-server <chat> started successfully
2024-08-08T11:27:49.659853Z DEBUG llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:124: Waiting for llama-server <completion> to start...
2024-08-08T11:27:50.422822Z DEBUG llama_cpp_server::supervisor: crates/llama-cpp-server/src/supervisor.rs:132: llama-server <completion> started successfully
████████╗ █████╗ ██████╗ ██████╗ ██╗ ██╗
╚══██╔══╝██╔══██╗██╔══██╗██╔══██╗╚██╗ ██╔╝
██║ ███████║██████╔╝██████╔╝ ╚████╔╝
██║ ██╔══██║██╔══██╗██╔══██╗ ╚██╔╝
██║ ██║ ██║██████╔╝██████╔╝ ██║
╚═╝ ╚═╝ ╚═╝╚═════╝ ╚═════╝ ╚═╝
📄 Version 0.15.0-rc.3
🚀 Listening at 0.0.0.0:8080
https://github.com/TabbyML/tabby/issues/2810
If I try to use v0.14.0 (same options) I instead get this:
That i was get too, but only at builded tabby, when i am compiling 0.14.0 by my self it's works
By the way, ROCm will work at RX6600.
Tabby only supports the use of a single GPU. To utilize multiple GPUs, you can initiate multiple Tabby instances and set CUDA_VISIBLE_DEVICES (for cuda) or HIP_VISIBLE_DEVICES (for rocm) accordingly. You can use the HSA_OVERRIDE_GFX_VERSION variable if there is a similar GPU that is supported by ROCm you can set it to that. For example for RDNA2 you can set it to 10.3.0 and to 11.0.0 for RDNA3.
Right! I never tried to build v0.14.0 myself. I will see if I have time to try that later. If so, I will come back with the results.
I realized I didn't compile for ROCm before, only for Vulkan. When I try to compile with --features=rocm:
error: failed to run custom build command for `llama-cpp-server v0.14.0 (/tmp/tabby-src/crates/llama-cpp-server)`
Caused by:
process didn't exit successfully: `/tmp/tabby-src/target/release/build/llama-cpp-server-1b3738a0281592c2/build-script-build` (exit status: 101)
--- stdout
CMAKE_TOOLCHAIN_FILE_x86_64-unknown-linux-gnu = None
CMAKE_TOOLCHAIN_FILE_x86_64_unknown_linux_gnu = None
HOST_CMAKE_TOOLCHAIN_FILE = None
CMAKE_TOOLCHAIN_FILE = None
CMAKE_GENERATOR_x86_64-unknown-linux-gnu = None
CMAKE_GENERATOR_x86_64_unknown_linux_gnu = None
HOST_CMAKE_GENERATOR = None
CMAKE_GENERATOR = None
CMAKE_PREFIX_PATH_x86_64-unknown-linux-gnu = None
CMAKE_PREFIX_PATH_x86_64_unknown_linux_gnu = None
HOST_CMAKE_PREFIX_PATH = None
CMAKE_PREFIX_PATH = None
CMAKE_x86_64-unknown-linux-gnu = None
CMAKE_x86_64_unknown_linux_gnu = None
HOST_CMAKE = None
CMAKE = None
running: cd "/tmp/tabby-src/target/release/build/llama-cpp-server-40e8fadd415341a4/out/build" && CMAKE_PREFIX_PATH="" "cmake" "/tmp/tabby-src/crates/llama-cpp-server/./llama.cpp" "-DLLAMA_NATIVE=OFF" "-DBUILD_SHARED_LIBS=OFF" "-DINS_ENB=ON" "-DLLAMA_HIPBLAS=ON" "-DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang" "-DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++" "-DAMDGPU_TARGETS=gfx803;gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-;gfx940;gfx941;gfx942;gfx1010;gfx1012;gfx1030;gfx1031;gfx1100;gfx1101;gfx1102;gfx1103" "-DCMAKE_INSTALL_PREFIX=/tmp/tabby-src/target/release/build/llama-cpp-server-40e8fadd415341a4/out" "-DCMAKE_C_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_CXX_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_ASM_FLAGS= -ffunction-sections -fdata-sections -fPIC -m64" "-DCMAKE_ASM_COMPILER=/usr/bin/cc" "-DCMAKE_BUILD_TYPE=Release"
-- The C compiler identification is unknown
-- The CXX compiler identification is unknown
-- Configuring incomplete, errors occurred!
--- stderr
CMake Error at CMakeLists.txt:2 (project):
The CMAKE_C_COMPILER:
/opt/rocm/llvm/bin/clang
is not a full path to an existing compiler tool.
Tell CMake where to find the compiler by setting either the environment
variable "CC" or the CMake cache entry CMAKE_C_COMPILER to the full path to
the compiler, or to the compiler name if it is in the PATH.
CMake Error at CMakeLists.txt:2 (project):
The CMAKE_CXX_COMPILER:
/opt/rocm/llvm/bin/clang++
is not a full path to an existing compiler tool.
Tell CMake where to find the compiler by setting either the environment
variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
to the compiler, or to the compiler name if it is in the PATH.
thread 'main' panicked at /home/USER/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cmake-0.1.50/src/lib.rs:1098:5:
command did not execute successfully, got: exit status: 1
build script failed, must exit now
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
I use Gentoo and I should (as far as I know) have ROCm installed, but maybe I am missing something... I'll have to look into this more.
(This happens both on v0.14.0 and v0.15.0-r3...)
A bit of a hacky solution that at least gets Tabby to compile:
sudo mkdir /opt/rocm
sudo ln -sv /usr/lib/llvm/18 /opt/rocm/llvm
However, besides seeming slower to generate responses in the chat, there's no difference. It still seems to run on the CPU when using:
HSA_OVERRIDE_GFX_VERSION=10.3.0 cargo run --features rocm --release serve --model DeepseekCoder-1.3B --chat-model Qwen2-1.5B-Instruct --device rocm
I also tried adding HCC_AMDGPU_TARGET=gfx1032 to specify using my GPU and not my CPU. I don't know if that is how I am supposed to use that environment variable, but it did not work.
The fix for rocm is merged, the vulkan fix is probably similar (I haven't tested it): https://github.com/TabbyML/tabby/issues/2810#issuecomment-2283356626
The fix for rocm is merged, the vulkan fix is probably similar (I haven't tested it): #2810 (comment)
Does this fix also work on Windows? I'm running Tabby 0.18.0 and using --device rocm still ends up running the models on my CPU, not the GPU.
The fix for rocm is merged, the vulkan fix is probably similar (I haven't tested it): #2810 (comment)
Does this fix also work on Windows? I'm running Tabby
0.18.0and using--device rocmstill ends up running the models on my CPU, not the GPU.
We are not distributing Windows binaries for ROCm at the moment, so it won't work.
I recommend using the Vulkan backend for Windows if you have a non-Nvidia GPU card.